+1 for option 2 I suggest we simply move forward with the release candidates
Gyula Sent from my iPhone > On 20 Jan 2025, at 19:00, He Pin <he...@apache.org> wrote: > > I think @pjfanning means you configured the pekko version with the 1.2.0 > snapshot > https://repository.apache.org/content/groups/snapshots/org/apache/pekko/pekko-actor_2.13/1.2.0-M0+55-a75bc7a7-SNAPSHOT/ > > with `unpooled-heap` allocator type. > > and change the failing test with 7MB to see if it succeeds, then he is OK > with releasing it in 1.1.4 > > But I think option 2 will not cause many problems in real production, who > will run a Flink with 7MB? Even my phone has 16GB RAM. > >> On 2025/01/20 15:06:09 Alexander Fedulov wrote: >> Hi Ferenc, >> >> Impacting data exchange performance is definitely not an option. I'm simply >> not entirely sure if Flink's network stack delegates buffer allocation to >> Netty or handles it "manually." >> That said, even if we could confirm that this part is not impacted, setting >> the global configuration is still not a good idea. We already have the HDFS >> filesystem dependency inside the framework that depends on Netty 4, plus >> potentially multiple externalized connectors. >> >> Let me summarize the options we have, knowing that the allocator control >> localized to Pekko is currently not available: >> >> 1) Revert the changes >> Pros: >> - Unblocks the release >> Cons: >> - Leaves us with 20 unaddressed critical CVEs >> >> 2) Upgrade to Netty 4 and let the RPC module run with Netty's default >> settings (also bump the limits for the memory-restricted test, as done on >> master): >> Pros: >> - Resolves CVEs >> - Unblocks the release >> Cons: >> - Higher memory usage with the risk of tipping containers already at >> their memory limit into OOM >> - Unclear if the fractional autoallocation of Flink memory needs to >> be adjusted according to the new defaults >> >> 3) Help the colleagues working on Pekko to confirm that the patch for >> settings fulfills our purposes [1] and wait for a new Pekko release that >> allows isolated Netty settings solely for flink-rpc: >> Pros: >> - Resolves the CVEs >> - Memory usage similar to Netty 3, reducing the risk of OOM surprises >> for users >> Cons: >> - Could take time for the new Pekko release >> >> [1] https://github.com/apache/pekko/pull/1709#issuecomment-2599698240 >> >> Unless we can hope for a timely Pekko release, my vote would be to go ahead >> with option 2 and clearly document the potential need for increasing >> container memory limits for the sake of improved security. >> >> Best regards, >> Alex >> >> On Mon, 20 Jan 2025 at 14:11, Ferenc Csaky <ferenc.cs...@pm.me.invalid> >> wrote: >> >>> Since `flink-runtime` uses Netty4 for quite a while, I believe >>> enforcing UNPOOLED will affect shuffle performance. I did not >>> performed actual tests comparing Netty3 and Netty4 in this regard, >>> so I cannot back this with actual numbers, but I think losing >>> shuffle performance would affect more real-world use-cases and be a >>> bigger problem, than a bit more overall memory consumption for >>> RPC communication. >>> >>> To cover highly resource-limited use-cases where it might be useful >>> to spare some memory and performance is not critical, I would >>> suggest to document these options in the release notes and in the >>> product documentation as well. I already created a ticket for >>> that [1], so I plan to deliver it in the next couple days. >>> >>> WDYT? >>> >>> [1] https://issues.apache.org/jira/browse/FLINK-37099 >>> >>> >>> >>> On Monday, January 20th, 2025 at 13:10, ConradJam <jam.gz...@gmail.com> >>> wrote: >>> >>>> >>>> >>>> +1 >>>> >>>> Alexis Sarda-Espinosa sarda.espin...@gmail.com 于2025年1月20日周一 18:38写道: >>>> >>>>> Hello, >>>>> >>>>> what about io.netty.maxDirectMemory [1]? Is it relevant? I haven't been >>>>> able to understand exactly how much that changes, but I find it odd >>> that, >>>>> for the default, <"practical max direct memory" would be 2 * max >>> memory as >>>>> defined by the JDK>. >>>>> >>>>> [1] >>>>> >>>>> >>> https://github.com/netty/netty/blob/4.1/common/src/main/java/io/netty/util/internal/PlatformDependent.java#L162 >>>>> >>>>> Regards, >>>>> Alexis. >>>>> >>>>> Am Mo., 20. Jan. 2025 um 04:53 Uhr schrieb He Pin he...@apache.org: >>>>> >>>>>> I think so, not sure how Flink works, but if they share the same key >>> and >>>>>> running in the same JVM process, which can be. >>>>>> >>>>>> On 2025/01/18 16:58:15 Alexander Fedulov wrote: >>>>>> >>>>>>> @He Pin, >>>>>>> Thanks for bringing this up. >>>>>>> So, if I understand correctly, the problem is that there is >>> currently >>>>>>> no >>>>>>> way to control the underlying allocator exclusively for Pekko. >>> Setting >>>>>>> `-Dio.netty.allocator.type=unpooled` would impact Netty's behavior >>>>>>> across >>>>>>> other parts of the framework. >>>>>>> >>>>>>> Does anyone know if this could potentially affect the data exchange >>>>>>> network >>>>>>> stack in `flink-runtime`, which is also based on Netty? >>>>>>> >>>>>>> Best, >>>>>>> Alex >>>>>>> >>>>>>> On Sat, 18 Jan 2025 at 04:10, He Pin he...@apache.org wrote: >>>>>>> >>>>>>>>> +1 for Netty4 with UNPOOLED memory allocator to not change the >>>>>>>>> default >>>>>>>>> memory footprint. >>>>>>>> >>>>>>>> That can only be done with another release, otherwise if will >>> reduce >>>>>>>> the >>>>>>>> performance. >>>>>>>> >>>>>>>> see https://github.com/apache/pekko/pull/1709 >>>>>>>> >>>>>>>> On 2025/01/17 17:05:06 Maximilian Michels wrote: >>>>>>>> >>>>>>>>> +1 for Netty4 with UNPOOLED memory allocator to not change the >>>>>>>>> default >>>>>>>>> memory footprint. >>>>>>>>> >>>>>>>>> -Max >>>>>>>>> >>>>>>>>> On Fri, Jan 17, 2025 at 1:15 PM Samrat Deb >>> decordea...@gmail.com >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> +1 to move to netty4. >>>>>>>>>> >>>>>>>>>> bests, >>>>>>>>>> Samrat >>>>>>>>>> >>>>>>>>>> On Fri, 17 Jan 2025 at 5:30 PM, Luke Chen show...@gmail.com >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Thanks for the summary! >>>>>>>>>>> >>>>>>>>>>> +1 to upgrade Pekko to have netty 4 in 1.19.2 and 1.20.1 >>>>>>>>>>> releases. >>>>>>>>>>> >>>>>>>>>>> Thanks. >>>>>>>>>>> Luke >>>>>>>>>>> >>>>>>>>>>> On Fri, Jan 17, 2025 at 7:50 PM He Pin he...@apache.org >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> +1 to Netty 4 >>>>>>>>>>>> >>>>>>>>>>>> On 2025/01/16 15:12:40 Alexander Fedulov wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi all, >>>>>>>>>>>>> >>>>>>>>>>>>> We have one remaining blocker for the 1.19.2 and 1.20.1 >>>>>>>>>>>>> releases, >>>>>>>>>>>>> namely >>>>>>>>>>>>> the issue associated with ticket FLINK-36510: "Upgrade >>>>>>>>>>>>> Pekko >>>>>>>>>>>>> from >>>>>>>>>>>>> 1.0.1 >>>>>>>>>>>>> to >>>>>>>>>>>>> 1.1.2" [1]. Here is the context: >>>>>>>>>>>>> >>>>>>>>>>>>> - The flink-rpc module is currently based on Pekko >>>>>>>>>>>>> 1.0.1, >>>>>>>>>>>>> which >>>>>>>>>>>>> bundles >>>>>>>>>>>>> Netty version 3.10.6. Netty 3.10.6 is the last 3.x >>>>>>>>>>>>> release and >>>>>>>>>>>>> officially >>>>>>>>>>>>> reached EOL more than eight years ago. It contains at >>>>>>>>>>>>> least >>>>>>>>>>>>> 20 known >>>>>>>>>>>>> critical vulnerabilities [2]. >>>>>>>>>>>>> - FLINK-36510 [1] upgrades flink-rpc to Pekko 1.1.2, >>>>>>>>>>>>> which >>>>>>>>>>>>> introduces >>>>>>>>>>>>> a >>>>>>>>>>>>> long-awaited migration to Netty 4.x. >>>>>>>>>>>>> - Memory allocation in Netty 4.x differs from Netty 3.x >>>>>>>>>>>>> and >>>>>>>>>>>>> has a >>>>>>>>>>>>> larger >>>>>>>>>>>>> memory footprint with default settings [3]. >>>>>>>>>>>>> - Norman Mauerer, Netty's project lead, strongly >>>>>>>>>>>>> recommends >>>>>>>>>>>>> moving >>>>>>>>>>>>> away >>>>>>>>>>>>> from Netty 3 as soon as possible [4]. >>>>>>>>>>>>> - According to Norman, setting >>>>>>>>>>>>> -Dio.netty.allocator.type=unpooled >>>>>>>>>>>>> should >>>>>>>>>>>>> approximate Netty 3's memory behavior at the expense of >>>>>>>>>>>>> performance >>>>>>>>>>>>> improvements that Netty 4 would otherwise provide. That >>>>>>>>>>>>> said, >>>>>>>>>>>>> Netty >>>>>>>>>>>>> 4 >>>>>>>>>>>>> with >>>>>>>>>>>>> -Dio.netty.allocator.type=unpooled is not expected to >>>>>>>>>>>>> perform >>>>>>>>>>>>> worse >>>>>>>>>>>>> than >>>>>>>>>>>>> Netty 3. >>>>>>>>>>>>> - Although this change might seem too substantial for a >>>>>>>>>>>>> patch >>>>>>>>>>>>> release, I >>>>>>>>>>>>> propose proceeding with it due to the accumulated risks >>>>>>>>>>>>> of >>>>>>>>>>>>> staying >>>>>>>>>>>>> on >>>>>>>>>>>>> Netty >>>>>>>>>>>>> 3.10.6. This will need to be addressed in a 1.20 as a >>>>>>>>>>>>> patch >>>>>>>>>>>>> release >>>>>>>>>>>>> anyway, >>>>>>>>>>>>> given that 1.20 is designated as LTS, and we can expect >>>>>>>>>>>>> Netty >>>>>>>>>>>>> 3 to >>>>>>>>>>>>> accrue >>>>>>>>>>>>> even more CVEs over time. >>>>>>>>>>>>> >>>>>>>>>>>>> Here you can find more details of the ongoing >>> discussion >>>>>>>>>>>>> [5]. >>>>>>>>>>>>> >>>>>>>>>>>>> Looking forward to hearing the community's thoughts on >>>>>>>>>>>>> whether we >>>>>>>>>>>>> should >>>>>>>>>>>>> proceed with the proposed changes. >>>>>>>>>>>>> >>>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-36510 >>>>>>>>>>>>> [2] >>>>>>>>>>>>> >>> https://mvnrepository.com/artifact/io.netty/netty/3.10.6.Final >>>>>>>>>>>>> [3] >>>>> >>>>> >>> https://issues.apache.org/jira/browse/FLINK-36510?focusedCommentId=17911219&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17911219 >>>>> >>>>>>>>>>>>> [4] >>>>>>>>>>>>> >>> https://github.com/apache/flink/pull/25866#issuecomment-2595168560 >>>>>>>>>>>>> [5] https://github.com/apache/flink/pull/25866 >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Alex >>> >>