Re: [DISCUSS] FLIP-339: Support Adaptive Partition Selection for StreamPartitioner

Rui Fan Thu, 15 Jan 2026 22:52:36 -0800

Thanks everyone for the valuable discussion!

Sounds make sense to only support rebalance and rescale in this FLIP.


Best,
Rui


On Fri, Jan 16, 2026 at 7:49 AM Rui Fan <[email protected]> wrote:

> Thanks everyone for the valuable discussion!
>
>
>
> On Fri, Jan 16, 2026 at 7:08 AM yuanfeng hu <[email protected]>
> wrote:
>
>> Thanks for the quick update and the ongoing discussion.
>>
>> I would like to +1 for focusing only on rebalance and rescale and
>> excluding shuffle from the adaptive selection logic for now.
>>
>> My reasoning is that when a user explicitly chooses the shuffle()
>> partitioner, it often implies a requirement for statistical randomness.
>> Introducing adaptive logic based on downstream load would inherently break
>> this randomness. In certain scenarios, such as Machine Learning (ML) data
>> preparation or specific statistical sampling tasks, the uniform randomness
>> provided by shuffle is a functional requirement rather than just a
>> distribution strategy.
>>
>> By only supporting rebalance and rescale, we respect the user's explicit
>> intent for those specific operators while avoiding potential side effects
>> in workloads that rely on the stochastic nature of shuffling.
>>
>>
>>
>> Best regards,
>>
>> Yuanfeng
>>
>>
>> > 2026年1月16日 11:30，Yuepeng Pan <[email protected]> 写道：
>> >
>> > Thanks Rui Ran and Zhanghao Chen for the comments.
>> >
>> >> Agree with Rui that we should exclude customPartition as it may bound
>> to
>> > the specific downstream subtask depending on the implementation.
>> >> IIUC, the customPartition is bound to the specific downstream subtask,
>> > right?
>> >
>> > Yes, You're right. Sorry for the dirt from the previous design. I
>> deleted
>> > the related description from the main design.
>> >
>> >> [From Hao]:  I'm also hesitated on whether we should cover shuffle as
>> > well. Shuffle partitioner just
>> >> randomly chooses a downstream task, and the idea of adaptivity breaks
>> > the randomness.
>> >> I'm not sure if any one would ever rely on the randomness, looking
>> > forward to community ideas on it.
>> >
>> >> [From Rui]: could you please explain in FLIP how to find the suitable
>> > downstream
>> >> subtask for different partitioners?
>> >
>> > In theory, the adaptive modes of RescalePartitioner and
>> > RebalancePartitioner can share the same logic when searching for the
>> > optimal partition, as shown in [1].
>> >
>> > Regarding whether to support the adaptive shuffle() partitioning mode, I
>> > take a neutral stance—either supporting it or not is acceptable. One of
>> the
>> > main reasons is that we should try to avoid introducing yet another
>> > parameter to control whether adaptivity is enabled for shuffle(), since
>> we
>> > already have quite a few parameters. Introducing a new parameter only to
>> > support a small number of scenarios without sufficient user feedback
>> does
>> > not seem necessary.
>> >
>> > If we do decide to support it, the strategy for finding the optimal
>> > partition in adaptive shuffle() would be as follows: perform random
>> > selection for the number of times specified by
>> > taskmanager.network.adaptive-partitioner.max-traverse-size, record the
>> list
>> > of partition candidates, and then choose, as the final downstream task
>> to
>> > write data to, the task whose downstream data partition currently has
>> the
>> > least buffered data.
>> >
>> > Looking forward to your comments.
>> >
>> > [1]
>> >
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263425181#FLIP339:SupportAdaptivePartitionSelectionforStreamPartitioner-IntroducetheAdaptiveLoadBasedRecordWriter
>> >
>> >
>> > Best regards,
>> > Yuepeng Pan
>> >
>> >
>> >
>> >
>> > Zhanghao Chen <[email protected]> 于2026年1月16日周五 10:49写道：
>> >
>> >> Thanks Yuepeng and Rui for the discussion. Agree with Rui that we
>> should
>> >> exclude customPartition as it may bound to the specific downstream
>> subtask
>> >> depending on the implementation. I'm also hesitated on whether we
>> should
>> >> cover shuffle as well. Shuffle partitioner just randomly chooses a
>> >> downstream task, and the idea of adaptivity breaks the randomness. I'm
>> not
>> >> sure if any one would ever rely on the randomness, looking forward to
>> >> community ideas on it.
>> >>
>> >> Best,
>> >> Zhanghao Chen
>> >> ________________________________
>> >> From: Rui Fan <[email protected]>
>> >> Sent: Thursday, January 15, 2026 16:31
>> >> To: [email protected] <[email protected]>
>> >> Subject: Re: [DISCUSS] FLIP-339: Support Adaptive Partition Selection
>> for
>> >> StreamPartitioner
>> >>
>> >> Thanks Yuepeng for the update.
>> >>
>> >>> For [(Potential)Data not bound to subtask] scenarios (4 partition
>> >> strategies: *Rebalance, Rescale, Shuffle, customPartition*),
>> >>> adjusting dynamically the subtask of the received data according to
>> the
>> >> processing load of the downstream operator would be
>> >>> good to achieve the effect of peak-shaving and valley-filling, and
>> try to
>> >> ensure the throughput of flink jobs.
>> >>
>> >> IIUC, the customPartition is bound to the specific downstream subtask,
>> >> right?
>> >>
>> >> Also, could you please explain in FLIP how to find the suitable
>> downstream
>> >> subtask for different partitioners?
>> >> Specifically
>> >>
>> >> Best,
>> >> Rui
>> >>
>> >> On Thu, Jan 15, 2026 at 7:37 AM Yuepeng Pan <[email protected]>
>> >> wrote:
>> >>
>> >>> Hi, devs.
>> >>>
>> >>> FYI about the review comments from Mang Zhang(Thanks):
>> >>>
>> >>>> Hi, Yuepeng,
>> >>>>
>> >>>> Thanks!
>> >>>>
>> >>>> Regarding FLIP-339, the current design is exactly what I originally
>> >>> envisioned.
>> >>>> Whether using Flink JAR or SQL, the cost to users is low, and it is
>> more
>> >>> versatile.
>> >>>> LGTM
>> >>>>
>> >>>
>> >>>> --
>> >>>
>> >>>> Best regards,
>> >>>
>> >>>> Mang Zhang
>> >>>
>> >>>
>> >>> Best regards,
>> >>>
>> >>> Yuepeng Pan
>> >>>
>> >>>
>> >>>
>> >>> Yuepeng Pan <[email protected]> 于2025年12月16日周二 12:02写道：
>> >>>
>> >>>> Thanks Mang for the review.
>> >>>>
>> >>>> Since the names of the subtitles are somewhat ambiguous and may lead
>> to
>> >>>> misunderstandings about not preserving the original interface,
>> >>>> I have made some corresponding change[1].
>> >>>>
>> >>>> Hope this helps. Please take a look if you had the free time.
>> >>>> Thank you.
>> >>>>
>> >>>> [1]
>> >>>>
>> >>>
>> >>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263425181#FLIP339:SupportAdaptivePartitionSelectionforStreamPartitioner-IntroducethenewmethodsatAPIlevel
>> >>>> :
>> >>>>
>> >>>> Best regards,
>> >>>> Yuepeng Pan
>> >>>>
>> >>>> Mang Zhang <[email protected]> 于2025年12月16日周二 10:53写道：
>> >>>>
>> >>>>> hi Yuepeng
>> >>>>> Thank you for continuing to drive this FLIP forward.
>> >>>>> Regarding the changes to the DataStream API in FLIP, I haven't
>> >> observed
>> >>>>> any forward compatibility with the original API. Could you explain
>> the
>> >>>>> rationale behind this design choice?
>> >>>>> Forward-compatible APIs reduce the cost for users to upgrade Flink
>> >>>>> versions.
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>>
>> >>>>> Best regards,
>> >>>>> Mang Zhang
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> At 2025-12-15 12:40:00, "Pan Yuepeng" <[email protected]>
>> wrote:
>> >>>>>> Hi devs,
>> >>>>>>
>> >>>>>> I re-sorted out and supplemented the 'FLIP-339[1] Support Adaptive
>> >>>>>> Partition Selection for StreamPartitioner' based on Flink JIRA[2].
>> >>>>>>
>> >>>>>> Flink offers multiple partition strategies, some of which bind data
>> >> to
>> >>>>>> downstream subtasks, while others do not (e.g., shuffle, rescale,
>> >>>>>> rebalance).
>> >>>>>> For [Data not bound to subtasks] scenarios, overloaded
>> sub-task-nodes
>> >>> may
>> >>>>>> slow down the processing of Flink jobs, leading to backpressure and
>> >>> data
>> >>>>>> lag. Dynamically adjusting the partition of data to subtasks based
>> on
>> >>> the
>> >>>>>> processing load of downstream operators helps achieve a
>> peak-shaving
>> >>> and
>> >>>>>> valley-filling effect, thereby striving to maintain the throughput
>> of
>> >>>>> Flink
>> >>>>>> jobs.
>> >>>>>>
>> >>>>>> The raw discussions could be found in the Flink JIRA[2].
>> >>>>>> I really appreciate developers involved in the discussion for the
>> >>>>> valuable
>> >>>>>> help and suggestions in advance.
>> >>>>>>
>> >>>>>> Please refer to the FLIP[1] wiki for more details about the
>> proposed
>> >>>>> design
>> >>>>>> and implementation.
>> >>>>>>
>> >>>>>> Welcome any feedback and opinions on this proposal.
>> >>>>>>
>> >>>>>> [1] https://cwiki.apache.org/confluence/x/nYyzDw
>> >>>>>> [2] https://issues.apache.org/jira/browse/FLINK-31655
>> >>>>>>
>> >>>>>> Best regards,
>> >>>>>> Yuepeng Pan
>> >>>>>
>> >>>>
>> >>>
>> >>
>>
>>

Re: [DISCUSS] FLIP-339: Support Adaptive Partition Selection for StreamPartitioner

Reply via email to