+1 for option 2

I suggest we simply move forward with the release candidates

Gyula

Sent from my iPhone

> On 20 Jan 2025, at 19:00, He Pin <he...@apache.org> wrote:
> 
> I think @pjfanning means you configured the pekko version with the 1.2.0 
> snapshot
> https://repository.apache.org/content/groups/snapshots/org/apache/pekko/pekko-actor_2.13/1.2.0-M0+55-a75bc7a7-SNAPSHOT/
> 
> with `unpooled-heap` allocator type.
> 
> and change the failing test with 7MB to see if it succeeds, then he is OK 
> with releasing it in 1.1.4
> 
> But I think option 2 will not cause many problems in real production, who 
> will run a Flink with 7MB? Even my phone has 16GB RAM.
> 
>> On 2025/01/20 15:06:09 Alexander Fedulov wrote:
>> Hi Ferenc,
>> 
>> Impacting data exchange performance is definitely not an option. I'm simply
>> not entirely sure if Flink's network stack delegates buffer allocation to
>> Netty or handles it "manually."
>> That said, even if we could confirm that this part is not impacted, setting
>> the global configuration is still not a good idea. We already have the HDFS
>> filesystem dependency inside the framework that depends on Netty 4, plus
>> potentially multiple externalized connectors.
>> 
>> Let me summarize the options we have, knowing that the allocator control
>> localized to Pekko is currently not available:
>> 
>> 1) Revert the changes
>>    Pros:
>>      - Unblocks the release
>>    Cons:
>>      - Leaves us with 20 unaddressed critical CVEs
>> 
>> 2) Upgrade to Netty 4 and let the RPC module run with Netty's default
>> settings (also bump the limits for the memory-restricted test, as done on
>> master):
>>    Pros:
>>      - Resolves CVEs
>>      - Unblocks the release
>>    Cons:
>>      - Higher memory usage with the risk of tipping containers already at
>> their memory limit into OOM
>>      - Unclear if the fractional autoallocation of Flink memory needs to
>> be adjusted according to the new defaults
>> 
>> 3) Help the colleagues working on Pekko to confirm that the patch for
>> settings fulfills our purposes [1] and wait for a new Pekko release that
>> allows isolated Netty settings solely for flink-rpc:
>>    Pros:
>>      - Resolves the CVEs
>>      - Memory usage similar to Netty 3, reducing the risk of OOM surprises
>> for users
>>    Cons:
>>      - Could take time for the new Pekko release
>> 
>> [1] https://github.com/apache/pekko/pull/1709#issuecomment-2599698240
>> 
>> Unless we can hope for a timely Pekko release, my vote would be to go ahead
>> with option 2 and clearly document the potential need for increasing
>> container memory limits for the sake of improved security.
>> 
>> Best regards,
>> Alex
>> 
>> On Mon, 20 Jan 2025 at 14:11, Ferenc Csaky <ferenc.cs...@pm.me.invalid>
>> wrote:
>> 
>>> Since `flink-runtime` uses Netty4 for quite a while, I believe
>>> enforcing UNPOOLED will affect shuffle performance. I did not
>>> performed actual tests comparing Netty3 and Netty4 in this regard,
>>> so I cannot back this with actual numbers, but I think losing
>>> shuffle performance would affect more real-world use-cases and be a
>>> bigger problem, than a bit more overall memory consumption for
>>> RPC communication.
>>> 
>>> To cover highly resource-limited use-cases where it might be useful
>>> to spare some memory and performance is not critical, I would
>>> suggest to document these options in the release notes and in the
>>> product documentation as well. I already created a ticket for
>>> that [1], so I plan to deliver it in the next couple days.
>>> 
>>> WDYT?
>>> 
>>> [1] https://issues.apache.org/jira/browse/FLINK-37099
>>> 
>>> 
>>> 
>>> On Monday, January 20th, 2025 at 13:10, ConradJam <jam.gz...@gmail.com>
>>> wrote:
>>> 
>>>> 
>>>> 
>>>> +1
>>>> 
>>>> Alexis Sarda-Espinosa sarda.espin...@gmail.com 于2025年1月20日周一 18:38写道:
>>>> 
>>>>> Hello,
>>>>> 
>>>>> what about io.netty.maxDirectMemory [1]? Is it relevant? I haven't been
>>>>> able to understand exactly how much that changes, but I find it odd
>>> that,
>>>>> for the default, <"practical max direct memory" would be 2 * max
>>> memory as
>>>>> defined by the JDK>.
>>>>> 
>>>>> [1]
>>>>> 
>>>>> 
>>> https://github.com/netty/netty/blob/4.1/common/src/main/java/io/netty/util/internal/PlatformDependent.java#L162
>>>>> 
>>>>> Regards,
>>>>> Alexis.
>>>>> 
>>>>> Am Mo., 20. Jan. 2025 um 04:53 Uhr schrieb He Pin he...@apache.org:
>>>>> 
>>>>>> I think so, not sure how Flink works, but if they share the same key
>>> and
>>>>>> running in the same JVM process, which can be.
>>>>>> 
>>>>>> On 2025/01/18 16:58:15 Alexander Fedulov wrote:
>>>>>> 
>>>>>>> @He Pin,
>>>>>>> Thanks for bringing this up.
>>>>>>> So, if I understand correctly, the problem is that there is
>>> currently
>>>>>>> no
>>>>>>> way to control the underlying allocator exclusively for Pekko.
>>> Setting
>>>>>>> `-Dio.netty.allocator.type=unpooled` would impact Netty's behavior
>>>>>>> across
>>>>>>> other parts of the framework.
>>>>>>> 
>>>>>>> Does anyone know if this could potentially affect the data exchange
>>>>>>> network
>>>>>>> stack in `flink-runtime`, which is also based on Netty?
>>>>>>> 
>>>>>>> Best,
>>>>>>> Alex
>>>>>>> 
>>>>>>> On Sat, 18 Jan 2025 at 04:10, He Pin he...@apache.org wrote:
>>>>>>> 
>>>>>>>>> +1 for Netty4 with UNPOOLED memory allocator to not change the
>>>>>>>>> default
>>>>>>>>> memory footprint.
>>>>>>>> 
>>>>>>>> That can only be done with another release, otherwise if will
>>> reduce
>>>>>>>> the
>>>>>>>> performance.
>>>>>>>> 
>>>>>>>> see https://github.com/apache/pekko/pull/1709
>>>>>>>> 
>>>>>>>> On 2025/01/17 17:05:06 Maximilian Michels wrote:
>>>>>>>> 
>>>>>>>>> +1 for Netty4 with UNPOOLED memory allocator to not change the
>>>>>>>>> default
>>>>>>>>> memory footprint.
>>>>>>>>> 
>>>>>>>>> -Max
>>>>>>>>> 
>>>>>>>>> On Fri, Jan 17, 2025 at 1:15 PM Samrat Deb
>>> decordea...@gmail.com
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> +1 to move to netty4.
>>>>>>>>>> 
>>>>>>>>>> bests,
>>>>>>>>>> Samrat
>>>>>>>>>> 
>>>>>>>>>> On Fri, 17 Jan 2025 at 5:30 PM, Luke Chen show...@gmail.com
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Thanks for the summary!
>>>>>>>>>>> 
>>>>>>>>>>> +1 to upgrade Pekko to have netty 4 in 1.19.2 and 1.20.1
>>>>>>>>>>> releases.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks.
>>>>>>>>>>> Luke
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, Jan 17, 2025 at 7:50 PM He Pin he...@apache.org
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> +1 to Netty 4
>>>>>>>>>>>> 
>>>>>>>>>>>> On 2025/01/16 15:12:40 Alexander Fedulov wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> We have one remaining blocker for the 1.19.2 and 1.20.1
>>>>>>>>>>>>> releases,
>>>>>>>>>>>>> namely
>>>>>>>>>>>>> the issue associated with ticket FLINK-36510: "Upgrade
>>>>>>>>>>>>> Pekko
>>>>>>>>>>>>> from
>>>>>>>>>>>>> 1.0.1
>>>>>>>>>>>>> to
>>>>>>>>>>>>> 1.1.2" [1]. Here is the context:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> - The flink-rpc module is currently based on Pekko
>>>>>>>>>>>>> 1.0.1,
>>>>>>>>>>>>> which
>>>>>>>>>>>>> bundles
>>>>>>>>>>>>> Netty version 3.10.6. Netty 3.10.6 is the last 3.x
>>>>>>>>>>>>> release and
>>>>>>>>>>>>> officially
>>>>>>>>>>>>> reached EOL more than eight years ago. It contains at
>>>>>>>>>>>>> least
>>>>>>>>>>>>> 20 known
>>>>>>>>>>>>> critical vulnerabilities [2].
>>>>>>>>>>>>> - FLINK-36510 [1] upgrades flink-rpc to Pekko 1.1.2,
>>>>>>>>>>>>> which
>>>>>>>>>>>>> introduces
>>>>>>>>>>>>> a
>>>>>>>>>>>>> long-awaited migration to Netty 4.x.
>>>>>>>>>>>>> - Memory allocation in Netty 4.x differs from Netty 3.x
>>>>>>>>>>>>> and
>>>>>>>>>>>>> has a
>>>>>>>>>>>>> larger
>>>>>>>>>>>>> memory footprint with default settings [3].
>>>>>>>>>>>>> - Norman Mauerer, Netty's project lead, strongly
>>>>>>>>>>>>> recommends
>>>>>>>>>>>>> moving
>>>>>>>>>>>>> away
>>>>>>>>>>>>> from Netty 3 as soon as possible [4].
>>>>>>>>>>>>> - According to Norman, setting
>>>>>>>>>>>>> -Dio.netty.allocator.type=unpooled
>>>>>>>>>>>>> should
>>>>>>>>>>>>> approximate Netty 3's memory behavior at the expense of
>>>>>>>>>>>>> performance
>>>>>>>>>>>>> improvements that Netty 4 would otherwise provide. That
>>>>>>>>>>>>> said,
>>>>>>>>>>>>> Netty
>>>>>>>>>>>>> 4
>>>>>>>>>>>>> with
>>>>>>>>>>>>> -Dio.netty.allocator.type=unpooled is not expected to
>>>>>>>>>>>>> perform
>>>>>>>>>>>>> worse
>>>>>>>>>>>>> than
>>>>>>>>>>>>> Netty 3.
>>>>>>>>>>>>> - Although this change might seem too substantial for a
>>>>>>>>>>>>> patch
>>>>>>>>>>>>> release, I
>>>>>>>>>>>>> propose proceeding with it due to the accumulated risks
>>>>>>>>>>>>> of
>>>>>>>>>>>>> staying
>>>>>>>>>>>>> on
>>>>>>>>>>>>> Netty
>>>>>>>>>>>>> 3.10.6. This will need to be addressed in a 1.20 as a
>>>>>>>>>>>>> patch
>>>>>>>>>>>>> release
>>>>>>>>>>>>> anyway,
>>>>>>>>>>>>> given that 1.20 is designated as LTS, and we can expect
>>>>>>>>>>>>> Netty
>>>>>>>>>>>>> 3 to
>>>>>>>>>>>>> accrue
>>>>>>>>>>>>> even more CVEs over time.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Here you can find more details of the ongoing
>>> discussion
>>>>>>>>>>>>> [5].
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Looking forward to hearing the community's thoughts on
>>>>>>>>>>>>> whether we
>>>>>>>>>>>>> should
>>>>>>>>>>>>> proceed with the proposed changes.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-36510
>>>>>>>>>>>>> [2]
>>>>>>>>>>>>> 
>>> https://mvnrepository.com/artifact/io.netty/netty/3.10.6.Final
>>>>>>>>>>>>> [3]
>>>>> 
>>>>> 
>>> https://issues.apache.org/jira/browse/FLINK-36510?focusedCommentId=17911219&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17911219
>>>>> 
>>>>>>>>>>>>> [4]
>>>>>>>>>>>>> 
>>> https://github.com/apache/flink/pull/25866#issuecomment-2595168560
>>>>>>>>>>>>> [5] https://github.com/apache/flink/pull/25866
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Alex
>>> 
>> 

Reply via email to