Hi Alex, Thanks again for summarizing the situation! +1 for option 2, taking into account that a Pekko release just happened AFAIK.
I think if there will be consensus to move forward with Netty4 it would worth to create a follow-up improvement right away to bump Pekko ASAP when we have a release where the newly added isolated config is available. Best, Ferenc On Monday, January 20th, 2025 at 16:06, Alexander Fedulov <alexander.fedu...@gmail.com> wrote: > > > Hi Ferenc, > > Impacting data exchange performance is definitely not an option. I'm simply > not entirely sure if Flink's network stack delegates buffer allocation to > Netty or handles it "manually." > That said, even if we could confirm that this part is not impacted, setting > the global configuration is still not a good idea. We already have the HDFS > filesystem dependency inside the framework that depends on Netty 4, plus > potentially multiple externalized connectors. > > Let me summarize the options we have, knowing that the allocator control > localized to Pekko is currently not available: > > 1) Revert the changes > Pros: > - Unblocks the release > Cons: > - Leaves us with 20 unaddressed critical CVEs > > 2) Upgrade to Netty 4 and let the RPC module run with Netty's default > settings (also bump the limits for the memory-restricted test, as done on > master): > Pros: > - Resolves CVEs > - Unblocks the release > Cons: > - Higher memory usage with the risk of tipping containers already at > their memory limit into OOM > - Unclear if the fractional autoallocation of Flink memory needs to > be adjusted according to the new defaults > > 3) Help the colleagues working on Pekko to confirm that the patch for > settings fulfills our purposes [1] and wait for a new Pekko release that > allows isolated Netty settings solely for flink-rpc: > Pros: > - Resolves the CVEs > - Memory usage similar to Netty 3, reducing the risk of OOM surprises > for users > Cons: > - Could take time for the new Pekko release > > [1] https://github.com/apache/pekko/pull/1709#issuecomment-2599698240 > > Unless we can hope for a timely Pekko release, my vote would be to go ahead > with option 2 and clearly document the potential need for increasing > container memory limits for the sake of improved security. > > Best regards, > Alex > > On Mon, 20 Jan 2025 at 14:11, Ferenc Csaky ferenc.cs...@pm.me.invalid > > wrote: > > > Since `flink-runtime` uses Netty4 for quite a while, I believe > > enforcing UNPOOLED will affect shuffle performance. I did not > > performed actual tests comparing Netty3 and Netty4 in this regard, > > so I cannot back this with actual numbers, but I think losing > > shuffle performance would affect more real-world use-cases and be a > > bigger problem, than a bit more overall memory consumption for > > RPC communication. > > > > To cover highly resource-limited use-cases where it might be useful > > to spare some memory and performance is not critical, I would > > suggest to document these options in the release notes and in the > > product documentation as well. I already created a ticket for > > that [1], so I plan to deliver it in the next couple days. > > > > WDYT? > > > > [1] https://issues.apache.org/jira/browse/FLINK-37099 > > > > On Monday, January 20th, 2025 at 13:10, ConradJam jam.gz...@gmail.com > > wrote: > > > > > +1 > > > > > > Alexis Sarda-Espinosa sarda.espin...@gmail.com 于2025年1月20日周一 18:38写道: > > > > > > > Hello, > > > > > > > > what about io.netty.maxDirectMemory [1]? Is it relevant? I haven't been > > > > able to understand exactly how much that changes, but I find it odd > > > > that, > > > > for the default, <"practical max direct memory" would be 2 * max > > > > memory as > > > > defined by the JDK>. > > > > > > > > [1] > > > > https://github.com/netty/netty/blob/4.1/common/src/main/java/io/netty/util/internal/PlatformDependent.java#L162 > > > > > > Regards, > > > > Alexis. > > > > > > > > Am Mo., 20. Jan. 2025 um 04:53 Uhr schrieb He Pin he...@apache.org: > > > > > > > > > I think so, not sure how Flink works, but if they share the same key > > > > > and > > > > > running in the same JVM process, which can be. > > > > > > > > > > On 2025/01/18 16:58:15 Alexander Fedulov wrote: > > > > > > > > > > > @He Pin, > > > > > > Thanks for bringing this up. > > > > > > So, if I understand correctly, the problem is that there is > > > > > > currently > > > > > > no > > > > > > way to control the underlying allocator exclusively for Pekko. > > > > > > Setting > > > > > > `-Dio.netty.allocator.type=unpooled` would impact Netty's behavior > > > > > > across > > > > > > other parts of the framework. > > > > > > > > > > > > Does anyone know if this could potentially affect the data exchange > > > > > > network > > > > > > stack in `flink-runtime`, which is also based on Netty? > > > > > > > > > > > > Best, > > > > > > Alex > > > > > > > > > > > > On Sat, 18 Jan 2025 at 04:10, He Pin he...@apache.org wrote: > > > > > > > > > > > > > > +1 for Netty4 with UNPOOLED memory allocator to not change the > > > > > > > > default > > > > > > > > memory footprint. > > > > > > > > > > > > > > That can only be done with another release, otherwise if will > > > > > > > reduce > > > > > > > the > > > > > > > performance. > > > > > > > > > > > > > > see https://github.com/apache/pekko/pull/1709 > > > > > > > > > > > > > > On 2025/01/17 17:05:06 Maximilian Michels wrote: > > > > > > > > > > > > > > > +1 for Netty4 with UNPOOLED memory allocator to not change the > > > > > > > > default > > > > > > > > memory footprint. > > > > > > > > > > > > > > > > -Max > > > > > > > > > > > > > > > > On Fri, Jan 17, 2025 at 1:15 PM Samrat Deb > > > > > > > > decordea...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > > > > > +1 to move to netty4. > > > > > > > > > > > > > > > > > > bests, > > > > > > > > > Samrat > > > > > > > > > > > > > > > > > > On Fri, 17 Jan 2025 at 5:30 PM, Luke Chen show...@gmail.com > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Thanks for the summary! > > > > > > > > > > > > > > > > > > > > +1 to upgrade Pekko to have netty 4 in 1.19.2 and 1.20.1 > > > > > > > > > > releases. > > > > > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > Luke > > > > > > > > > > > > > > > > > > > > On Fri, Jan 17, 2025 at 7:50 PM He Pin he...@apache.org > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > +1 to Netty 4 > > > > > > > > > > > > > > > > > > > > > > On 2025/01/16 15:12:40 Alexander Fedulov wrote: > > > > > > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > > > > > We have one remaining blocker for the 1.19.2 and 1.20.1 > > > > > > > > > > > > releases, > > > > > > > > > > > > namely > > > > > > > > > > > > the issue associated with ticket FLINK-36510: "Upgrade > > > > > > > > > > > > Pekko > > > > > > > > > > > > from > > > > > > > > > > > > 1.0.1 > > > > > > > > > > > > to > > > > > > > > > > > > 1.1.2" [1]. Here is the context: > > > > > > > > > > > > > > > > > > > > > > > > - The flink-rpc module is currently based on Pekko > > > > > > > > > > > > 1.0.1, > > > > > > > > > > > > which > > > > > > > > > > > > bundles > > > > > > > > > > > > Netty version 3.10.6. Netty 3.10.6 is the last 3.x > > > > > > > > > > > > release and > > > > > > > > > > > > officially > > > > > > > > > > > > reached EOL more than eight years ago. It contains at > > > > > > > > > > > > least > > > > > > > > > > > > 20 known > > > > > > > > > > > > critical vulnerabilities [2]. > > > > > > > > > > > > - FLINK-36510 [1] upgrades flink-rpc to Pekko 1.1.2, > > > > > > > > > > > > which > > > > > > > > > > > > introduces > > > > > > > > > > > > a > > > > > > > > > > > > long-awaited migration to Netty 4.x. > > > > > > > > > > > > - Memory allocation in Netty 4.x differs from Netty 3.x > > > > > > > > > > > > and > > > > > > > > > > > > has a > > > > > > > > > > > > larger > > > > > > > > > > > > memory footprint with default settings [3]. > > > > > > > > > > > > - Norman Mauerer, Netty's project lead, strongly > > > > > > > > > > > > recommends > > > > > > > > > > > > moving > > > > > > > > > > > > away > > > > > > > > > > > > from Netty 3 as soon as possible [4]. > > > > > > > > > > > > - According to Norman, setting > > > > > > > > > > > > -Dio.netty.allocator.type=unpooled > > > > > > > > > > > > should > > > > > > > > > > > > approximate Netty 3's memory behavior at the expense of > > > > > > > > > > > > performance > > > > > > > > > > > > improvements that Netty 4 would otherwise provide. That > > > > > > > > > > > > said, > > > > > > > > > > > > Netty > > > > > > > > > > > > 4 > > > > > > > > > > > > with > > > > > > > > > > > > -Dio.netty.allocator.type=unpooled is not expected to > > > > > > > > > > > > perform > > > > > > > > > > > > worse > > > > > > > > > > > > than > > > > > > > > > > > > Netty 3. > > > > > > > > > > > > - Although this change might seem too substantial for a > > > > > > > > > > > > patch > > > > > > > > > > > > release, I > > > > > > > > > > > > propose proceeding with it due to the accumulated risks > > > > > > > > > > > > of > > > > > > > > > > > > staying > > > > > > > > > > > > on > > > > > > > > > > > > Netty > > > > > > > > > > > > 3.10.6. This will need to be addressed in a 1.20 as a > > > > > > > > > > > > patch > > > > > > > > > > > > release > > > > > > > > > > > > anyway, > > > > > > > > > > > > given that 1.20 is designated as LTS, and we can expect > > > > > > > > > > > > Netty > > > > > > > > > > > > 3 to > > > > > > > > > > > > accrue > > > > > > > > > > > > even more CVEs over time. > > > > > > > > > > > > > > > > > > > > > > > > Here you can find more details of the ongoing > > > > > > > > > > > > discussion > > > > > > > > > > > > [5]. > > > > > > > > > > > > > > > > > > > > > > > > Looking forward to hearing the community's thoughts on > > > > > > > > > > > > whether we > > > > > > > > > > > > should > > > > > > > > > > > > proceed with the proposed changes. > > > > > > > > > > > > > > > > > > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-36510 > > > > > > > > > > > > [2] > > > > https://mvnrepository.com/artifact/io.netty/netty/3.10.6.Final > > > > > > > > > > > > > > [3] > > > > https://issues.apache.org/jira/browse/FLINK-36510?focusedCommentId=17911219&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17911219 > > > > > > > > > > > > > > [4] > > > > https://github.com/apache/flink/pull/25866#issuecomment-2595168560 > > > > > > > > > > > > > > [5] https://github.com/apache/flink/pull/25866 > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > Alex