Re: [DISCUSS] FLIP-339: Support Adaptive Partition Selection for StreamPartitioner

Yuepeng Pan Thu, 15 Jan 2026 22:57:26 -0800

Thanks all of you involved in the discussion.

I will initiate a vote as soon as possible.



Best regards,

Yuepeng Pan

Rui Fan <[email protected]> 于2026年1月16日周五 14:52写道：

> Thanks everyone for the valuable discussion!
>
> Sounds make sense to only support rebalance and rescale in this FLIP.
>
> Best,
> Rui
>
>
> On Fri, Jan 16, 2026 at 7:49 AM Rui Fan <[email protected]> wrote:
>
> > Thanks everyone for the valuable discussion!
> >
> >
> >
> > On Fri, Jan 16, 2026 at 7:08 AM yuanfeng hu <[email protected]>
> > wrote:
> >
> >> Thanks for the quick update and the ongoing discussion.
> >>
> >> I would like to +1 for focusing only on rebalance and rescale and
> >> excluding shuffle from the adaptive selection logic for now.
> >>
> >> My reasoning is that when a user explicitly chooses the shuffle()
> >> partitioner, it often implies a requirement for statistical randomness.
> >> Introducing adaptive logic based on downstream load would inherently
> break
> >> this randomness. In certain scenarios, such as Machine Learning (ML)
> data
> >> preparation or specific statistical sampling tasks, the uniform
> randomness
> >> provided by shuffle is a functional requirement rather than just a
> >> distribution strategy.
> >>
> >> By only supporting rebalance and rescale, we respect the user's explicit
> >> intent for those specific operators while avoiding potential side
> effects
> >> in workloads that rely on the stochastic nature of shuffling.
> >>
> >>
> >>
> >> Best regards,
> >>
> >> Yuanfeng
> >>
> >>
> >> > 2026年1月16日 11:30，Yuepeng Pan <[email protected]> 写道：
> >> >
> >> > Thanks Rui Ran and Zhanghao Chen for the comments.
> >> >
> >> >> Agree with Rui that we should exclude customPartition as it may bound
> >> to
> >> > the specific downstream subtask depending on the implementation.
> >> >> IIUC, the customPartition is bound to the specific downstream
> subtask,
> >> > right?
> >> >
> >> > Yes, You're right. Sorry for the dirt from the previous design. I
> >> deleted
> >> > the related description from the main design.
> >> >
> >> >> [From Hao]:  I'm also hesitated on whether we should cover shuffle as
> >> > well. Shuffle partitioner just
> >> >> randomly chooses a downstream task, and the idea of adaptivity breaks
> >> > the randomness.
> >> >> I'm not sure if any one would ever rely on the randomness, looking
> >> > forward to community ideas on it.
> >> >
> >> >> [From Rui]: could you please explain in FLIP how to find the suitable
> >> > downstream
> >> >> subtask for different partitioners?
> >> >
> >> > In theory, the adaptive modes of RescalePartitioner and
> >> > RebalancePartitioner can share the same logic when searching for the
> >> > optimal partition, as shown in [1].
> >> >
> >> > Regarding whether to support the adaptive shuffle() partitioning
> mode, I
> >> > take a neutral stance—either supporting it or not is acceptable. One
> of
> >> the
> >> > main reasons is that we should try to avoid introducing yet another
> >> > parameter to control whether adaptivity is enabled for shuffle(),
> since
> >> we
> >> > already have quite a few parameters. Introducing a new parameter only
> to
> >> > support a small number of scenarios without sufficient user feedback
> >> does
> >> > not seem necessary.
> >> >
> >> > If we do decide to support it, the strategy for finding the optimal
> >> > partition in adaptive shuffle() would be as follows: perform random
> >> > selection for the number of times specified by
> >> > taskmanager.network.adaptive-partitioner.max-traverse-size, record the
> >> list
> >> > of partition candidates, and then choose, as the final downstream task
> >> to
> >> > write data to, the task whose downstream data partition currently has
> >> the
> >> > least buffered data.
> >> >
> >> > Looking forward to your comments.
> >> >
> >> > [1]
> >> >
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263425181#FLIP339:SupportAdaptivePartitionSelectionforStreamPartitioner-IntroducetheAdaptiveLoadBasedRecordWriter
> >> >
> >> >
> >> > Best regards,
> >> > Yuepeng Pan
> >> >
> >> >
> >> >
> >> >
> >> > Zhanghao Chen <[email protected]> 于2026年1月16日周五 10:49写道：
> >> >
> >> >> Thanks Yuepeng and Rui for the discussion. Agree with Rui that we
> >> should
> >> >> exclude customPartition as it may bound to the specific downstream
> >> subtask
> >> >> depending on the implementation. I'm also hesitated on whether we
> >> should
> >> >> cover shuffle as well. Shuffle partitioner just randomly chooses a
> >> >> downstream task, and the idea of adaptivity breaks the randomness.
> I'm
> >> not
> >> >> sure if any one would ever rely on the randomness, looking forward to
> >> >> community ideas on it.
> >> >>
> >> >> Best,
> >> >> Zhanghao Chen
> >> >> ________________________________
> >> >> From: Rui Fan <[email protected]>
> >> >> Sent: Thursday, January 15, 2026 16:31
> >> >> To: [email protected] <[email protected]>
> >> >> Subject: Re: [DISCUSS] FLIP-339: Support Adaptive Partition Selection
> >> for
> >> >> StreamPartitioner
> >> >>
> >> >> Thanks Yuepeng for the update.
> >> >>
> >> >>> For [(Potential)Data not bound to subtask] scenarios (4 partition
> >> >> strategies: *Rebalance, Rescale, Shuffle, customPartition*),
> >> >>> adjusting dynamically the subtask of the received data according to
> >> the
> >> >> processing load of the downstream operator would be
> >> >>> good to achieve the effect of peak-shaving and valley-filling, and
> >> try to
> >> >> ensure the throughput of flink jobs.
> >> >>
> >> >> IIUC, the customPartition is bound to the specific downstream
> subtask,
> >> >> right?
> >> >>
> >> >> Also, could you please explain in FLIP how to find the suitable
> >> downstream
> >> >> subtask for different partitioners?
> >> >> Specifically
> >> >>
> >> >> Best,
> >> >> Rui
> >> >>
> >> >> On Thu, Jan 15, 2026 at 7:37 AM Yuepeng Pan <[email protected]>
> >> >> wrote:
> >> >>
> >> >>> Hi, devs.
> >> >>>
> >> >>> FYI about the review comments from Mang Zhang(Thanks):
> >> >>>
> >> >>>> Hi, Yuepeng,
> >> >>>>
> >> >>>> Thanks!
> >> >>>>
> >> >>>> Regarding FLIP-339, the current design is exactly what I originally
> >> >>> envisioned.
> >> >>>> Whether using Flink JAR or SQL, the cost to users is low, and it is
> >> more
> >> >>> versatile.
> >> >>>> LGTM
> >> >>>>
> >> >>>
> >> >>>> --
> >> >>>
> >> >>>> Best regards,
> >> >>>
> >> >>>> Mang Zhang
> >> >>>
> >> >>>
> >> >>> Best regards,
> >> >>>
> >> >>> Yuepeng Pan
> >> >>>
> >> >>>
> >> >>>
> >> >>> Yuepeng Pan <[email protected]> 于2025年12月16日周二 12:02写道：
> >> >>>
> >> >>>> Thanks Mang for the review.
> >> >>>>
> >> >>>> Since the names of the subtitles are somewhat ambiguous and may
> lead
> >> to
> >> >>>> misunderstandings about not preserving the original interface,
> >> >>>> I have made some corresponding change[1].
> >> >>>>
> >> >>>> Hope this helps. Please take a look if you had the free time.
> >> >>>> Thank you.
> >> >>>>
> >> >>>> [1]
> >> >>>>
> >> >>>
> >> >>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263425181#FLIP339:SupportAdaptivePartitionSelectionforStreamPartitioner-IntroducethenewmethodsatAPIlevel
> >> >>>> :
> >> >>>>
> >> >>>> Best regards,
> >> >>>> Yuepeng Pan
> >> >>>>
> >> >>>> Mang Zhang <[email protected]> 于2025年12月16日周二 10:53写道：
> >> >>>>
> >> >>>>> hi Yuepeng
> >> >>>>> Thank you for continuing to drive this FLIP forward.
> >> >>>>> Regarding the changes to the DataStream API in FLIP, I haven't
> >> >> observed
> >> >>>>> any forward compatibility with the original API. Could you explain
> >> the
> >> >>>>> rationale behind this design choice?
> >> >>>>> Forward-compatible APIs reduce the cost for users to upgrade Flink
> >> >>>>> versions.
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>> --
> >> >>>>>
> >> >>>>> Best regards,
> >> >>>>> Mang Zhang
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>> At 2025-12-15 12:40:00, "Pan Yuepeng" <[email protected]>
> >> wrote:
> >> >>>>>> Hi devs,
> >> >>>>>>
> >> >>>>>> I re-sorted out and supplemented the 'FLIP-339[1] Support
> Adaptive
> >> >>>>>> Partition Selection for StreamPartitioner' based on Flink
> JIRA[2].
> >> >>>>>>
> >> >>>>>> Flink offers multiple partition strategies, some of which bind
> data
> >> >> to
> >> >>>>>> downstream subtasks, while others do not (e.g., shuffle, rescale,
> >> >>>>>> rebalance).
> >> >>>>>> For [Data not bound to subtasks] scenarios, overloaded
> >> sub-task-nodes
> >> >>> may
> >> >>>>>> slow down the processing of Flink jobs, leading to backpressure
> and
> >> >>> data
> >> >>>>>> lag. Dynamically adjusting the partition of data to subtasks
> based
> >> on
> >> >>> the
> >> >>>>>> processing load of downstream operators helps achieve a
> >> peak-shaving
> >> >>> and
> >> >>>>>> valley-filling effect, thereby striving to maintain the
> throughput
> >> of
> >> >>>>> Flink
> >> >>>>>> jobs.
> >> >>>>>>
> >> >>>>>> The raw discussions could be found in the Flink JIRA[2].
> >> >>>>>> I really appreciate developers involved in the discussion for the
> >> >>>>> valuable
> >> >>>>>> help and suggestions in advance.
> >> >>>>>>
> >> >>>>>> Please refer to the FLIP[1] wiki for more details about the
> >> proposed
> >> >>>>> design
> >> >>>>>> and implementation.
> >> >>>>>>
> >> >>>>>> Welcome any feedback and opinions on this proposal.
> >> >>>>>>
> >> >>>>>> [1] https://cwiki.apache.org/confluence/x/nYyzDw
> >> >>>>>> [2] https://issues.apache.org/jira/browse/FLINK-31655
> >> >>>>>>
> >> >>>>>> Best regards,
> >> >>>>>> Yuepeng Pan
> >> >>>>>
> >> >>>>
> >> >>>
> >> >>
> >>
> >>
>

Re: [DISCUSS] FLIP-339: Support Adaptive Partition Selection for StreamPartitioner

Reply via email to