Hi Ferenc, Impacting data exchange performance is definitely not an option. I'm simply not entirely sure if Flink's network stack delegates buffer allocation to Netty or handles it "manually." That said, even if we could confirm that this part is not impacted, setting the global configuration is still not a good idea. We already have the HDFS filesystem dependency inside the framework that depends on Netty 4, plus potentially multiple externalized connectors.
Let me summarize the options we have, knowing that the allocator control localized to Pekko is currently not available: 1) Revert the changes Pros: - Unblocks the release Cons: - Leaves us with 20 unaddressed critical CVEs 2) Upgrade to Netty 4 and let the RPC module run with Netty's default settings (also bump the limits for the memory-restricted test, as done on master): Pros: - Resolves CVEs - Unblocks the release Cons: - Higher memory usage with the risk of tipping containers already at their memory limit into OOM - Unclear if the fractional autoallocation of Flink memory needs to be adjusted according to the new defaults 3) Help the colleagues working on Pekko to confirm that the patch for settings fulfills our purposes [1] and wait for a new Pekko release that allows isolated Netty settings solely for flink-rpc: Pros: - Resolves the CVEs - Memory usage similar to Netty 3, reducing the risk of OOM surprises for users Cons: - Could take time for the new Pekko release [1] https://github.com/apache/pekko/pull/1709#issuecomment-2599698240 Unless we can hope for a timely Pekko release, my vote would be to go ahead with option 2 and clearly document the potential need for increasing container memory limits for the sake of improved security. Best regards, Alex On Mon, 20 Jan 2025 at 14:11, Ferenc Csaky <ferenc.cs...@pm.me.invalid> wrote: > Since `flink-runtime` uses Netty4 for quite a while, I believe > enforcing UNPOOLED will affect shuffle performance. I did not > performed actual tests comparing Netty3 and Netty4 in this regard, > so I cannot back this with actual numbers, but I think losing > shuffle performance would affect more real-world use-cases and be a > bigger problem, than a bit more overall memory consumption for > RPC communication. > > To cover highly resource-limited use-cases where it might be useful > to spare some memory and performance is not critical, I would > suggest to document these options in the release notes and in the > product documentation as well. I already created a ticket for > that [1], so I plan to deliver it in the next couple days. > > WDYT? > > [1] https://issues.apache.org/jira/browse/FLINK-37099 > > > > On Monday, January 20th, 2025 at 13:10, ConradJam <jam.gz...@gmail.com> > wrote: > > > > > > > +1 > > > > Alexis Sarda-Espinosa sarda.espin...@gmail.com 于2025年1月20日周一 18:38写道: > > > > > Hello, > > > > > > what about io.netty.maxDirectMemory [1]? Is it relevant? I haven't been > > > able to understand exactly how much that changes, but I find it odd > that, > > > for the default, <"practical max direct memory" would be 2 * max > memory as > > > defined by the JDK>. > > > > > > [1] > > > > > > > https://github.com/netty/netty/blob/4.1/common/src/main/java/io/netty/util/internal/PlatformDependent.java#L162 > > > > > > Regards, > > > Alexis. > > > > > > Am Mo., 20. Jan. 2025 um 04:53 Uhr schrieb He Pin he...@apache.org: > > > > > > > I think so, not sure how Flink works, but if they share the same key > and > > > > running in the same JVM process, which can be. > > > > > > > > On 2025/01/18 16:58:15 Alexander Fedulov wrote: > > > > > > > > > @He Pin, > > > > > Thanks for bringing this up. > > > > > So, if I understand correctly, the problem is that there is > currently > > > > > no > > > > > way to control the underlying allocator exclusively for Pekko. > Setting > > > > > `-Dio.netty.allocator.type=unpooled` would impact Netty's behavior > > > > > across > > > > > other parts of the framework. > > > > > > > > > > Does anyone know if this could potentially affect the data exchange > > > > > network > > > > > stack in `flink-runtime`, which is also based on Netty? > > > > > > > > > > Best, > > > > > Alex > > > > > > > > > > On Sat, 18 Jan 2025 at 04:10, He Pin he...@apache.org wrote: > > > > > > > > > > > > +1 for Netty4 with UNPOOLED memory allocator to not change the > > > > > > > default > > > > > > > memory footprint. > > > > > > > > > > > > That can only be done with another release, otherwise if will > reduce > > > > > > the > > > > > > performance. > > > > > > > > > > > > see https://github.com/apache/pekko/pull/1709 > > > > > > > > > > > > On 2025/01/17 17:05:06 Maximilian Michels wrote: > > > > > > > > > > > > > +1 for Netty4 with UNPOOLED memory allocator to not change the > > > > > > > default > > > > > > > memory footprint. > > > > > > > > > > > > > > -Max > > > > > > > > > > > > > > On Fri, Jan 17, 2025 at 1:15 PM Samrat Deb > decordea...@gmail.com > > > > > > > wrote: > > > > > > > > > > > > > > > +1 to move to netty4. > > > > > > > > > > > > > > > > bests, > > > > > > > > Samrat > > > > > > > > > > > > > > > > On Fri, 17 Jan 2025 at 5:30 PM, Luke Chen show...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Thanks for the summary! > > > > > > > > > > > > > > > > > > +1 to upgrade Pekko to have netty 4 in 1.19.2 and 1.20.1 > > > > > > > > > releases. > > > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > Luke > > > > > > > > > > > > > > > > > > On Fri, Jan 17, 2025 at 7:50 PM He Pin he...@apache.org > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > +1 to Netty 4 > > > > > > > > > > > > > > > > > > > > On 2025/01/16 15:12:40 Alexander Fedulov wrote: > > > > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > > > We have one remaining blocker for the 1.19.2 and 1.20.1 > > > > > > > > > > > releases, > > > > > > > > > > > namely > > > > > > > > > > > the issue associated with ticket FLINK-36510: "Upgrade > > > > > > > > > > > Pekko > > > > > > > > > > > from > > > > > > > > > > > 1.0.1 > > > > > > > > > > > to > > > > > > > > > > > 1.1.2" [1]. Here is the context: > > > > > > > > > > > > > > > > > > > > > > - The flink-rpc module is currently based on Pekko > > > > > > > > > > > 1.0.1, > > > > > > > > > > > which > > > > > > > > > > > bundles > > > > > > > > > > > Netty version 3.10.6. Netty 3.10.6 is the last 3.x > > > > > > > > > > > release and > > > > > > > > > > > officially > > > > > > > > > > > reached EOL more than eight years ago. It contains at > > > > > > > > > > > least > > > > > > > > > > > 20 known > > > > > > > > > > > critical vulnerabilities [2]. > > > > > > > > > > > - FLINK-36510 [1] upgrades flink-rpc to Pekko 1.1.2, > > > > > > > > > > > which > > > > > > > > > > > introduces > > > > > > > > > > > a > > > > > > > > > > > long-awaited migration to Netty 4.x. > > > > > > > > > > > - Memory allocation in Netty 4.x differs from Netty 3.x > > > > > > > > > > > and > > > > > > > > > > > has a > > > > > > > > > > > larger > > > > > > > > > > > memory footprint with default settings [3]. > > > > > > > > > > > - Norman Mauerer, Netty's project lead, strongly > > > > > > > > > > > recommends > > > > > > > > > > > moving > > > > > > > > > > > away > > > > > > > > > > > from Netty 3 as soon as possible [4]. > > > > > > > > > > > - According to Norman, setting > > > > > > > > > > > -Dio.netty.allocator.type=unpooled > > > > > > > > > > > should > > > > > > > > > > > approximate Netty 3's memory behavior at the expense of > > > > > > > > > > > performance > > > > > > > > > > > improvements that Netty 4 would otherwise provide. That > > > > > > > > > > > said, > > > > > > > > > > > Netty > > > > > > > > > > > 4 > > > > > > > > > > > with > > > > > > > > > > > -Dio.netty.allocator.type=unpooled is not expected to > > > > > > > > > > > perform > > > > > > > > > > > worse > > > > > > > > > > > than > > > > > > > > > > > Netty 3. > > > > > > > > > > > - Although this change might seem too substantial for a > > > > > > > > > > > patch > > > > > > > > > > > release, I > > > > > > > > > > > propose proceeding with it due to the accumulated risks > > > > > > > > > > > of > > > > > > > > > > > staying > > > > > > > > > > > on > > > > > > > > > > > Netty > > > > > > > > > > > 3.10.6. This will need to be addressed in a 1.20 as a > > > > > > > > > > > patch > > > > > > > > > > > release > > > > > > > > > > > anyway, > > > > > > > > > > > given that 1.20 is designated as LTS, and we can expect > > > > > > > > > > > Netty > > > > > > > > > > > 3 to > > > > > > > > > > > accrue > > > > > > > > > > > even more CVEs over time. > > > > > > > > > > > > > > > > > > > > > > Here you can find more details of the ongoing > discussion > > > > > > > > > > > [5]. > > > > > > > > > > > > > > > > > > > > > > Looking forward to hearing the community's thoughts on > > > > > > > > > > > whether we > > > > > > > > > > > should > > > > > > > > > > > proceed with the proposed changes. > > > > > > > > > > > > > > > > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-36510 > > > > > > > > > > > [2] > > > > > > > > > > > > https://mvnrepository.com/artifact/io.netty/netty/3.10.6.Final > > > > > > > > > > > [3] > > > > > > > https://issues.apache.org/jira/browse/FLINK-36510?focusedCommentId=17911219&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17911219 > > > > > > > > > > > > > > [4] > > > > > > > > > > > > https://github.com/apache/flink/pull/25866#issuecomment-2595168560 > > > > > > > > > > > [5] https://github.com/apache/flink/pull/25866 > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > Alex >