Thank you very much for your response and proposal. Since the scope of this email and the fact that I am not familiar with scheduling strategies in batch processing. I could only leave a few preliminary comments on whether your proposal could address the issues observed when streaming jobs enable both the Adaptive Scheduler and the AutoScaler.
In my limited reading, the issue described in FLINK-33123[1] occurs during the execution phase, where the JobGraph has already been generated. This phase is after the stage where your proposed fix takes effect. As a result, the following risks may still remain: - Jobs(enabled the Adaptive Scheduler and the AutoScaler) may still face the issue where upstream and downstream JobVertex instances are connected by Forward edges but have inconsistent parallelism. - The proposed fix seems to apply only to the dynamic graph generation mode of adaptive batch scheduling, and therefore may not be effective for the non-dynamic graph generation logic involved in FLINK-33123. In addition, the discussion thread and external references in FLINK-33123 already explain why ForwardPartitioner → RebalancePartitioner is used instead of ForwardPartitioner → RescalePartitioner in the streaming jobs(enabled the Adaptive Scheduler and the AutoScaler) Given this, I guess that it might be more appropriate to address batch mode and streaming mode as two separate cases, especially since the current scheduling modes are implemented by different schedulers. Of course, if a single, elegant solution can be shared between both bug-cases, that would be ideal. However, this would likely require some costs that are difficult to estimate at the moment, such as having sufficient efforts to carry out investigation, proof-of-concept work, and subsequent reviews. Achieving all of these in a short time frame may not be easy. Please correct me if I'm wrong. Thank you very much. BTW, If you are confident that the issue you mentioned is still not fixed in the latest version repo[2] of Flink and that no developers are currently working on it, you are more than welcome to report it in Jira[3] and try to address it. [1] https://issues.apache.org/jira/browse/FLINK-33123 [2] https://github.com/apache/flink [3] https://issues.apache.org/jira/projects/FLINK/issues Best regards, Yuepeng Pan Bonnie Arogyam Varghese via dev <[email protected]> 于2026年1月15日周四 00:56写道: > Hi Yuepeng Pan and other community members, > I have also seen a similar issue however wrt to batch. The eventual fix > could be leveraged for both streaming and batch modes. > > I have described the issue observed in batch mode and can be reproduced > with the following 2 ways: > 1. Disable operator chaining completely > 2. "pipeline.operator > -chaining.chain-operators-with-different-max-parallelism": "false" // > Default is true > > I have captured my findings and experimented with a fix for batch mode: > > https://docs.google.com/document/d/1TTj2ddlQTfDgtGb0ISmiKWt6R9U4RxJ59o6bULC1YtI/edit?usp=sharing > > > > On Tue, Jan 13, 2026 at 7:47 AM Yuepeng Pan <[email protected]> > wrote: > > > Hi community, > > > > I would like to start a discussion around the issue described in > > **FLINK-33123[1]**. > > > > This issue can mainly be broken down into two parts: > > a). > > Assuming that initially two upstream and downstream JobVertices connected > > by a FORWARD edge have the same parallelism, > > due to a rescale operation their parallelism becomes different. > > In this case, the current strategy may produce incorrect results when > > rebuilding the upstream–downstream network partition connections. > > b). > > Assuming that the parallelism of two upstream and downstream JobVertices > is > > different, > > but due to a rescale operation their parallelism needs to be adjusted to > be > > the same. > > In this scenario, it is not possible to determine the partition type > after > > the rescale. > > > > So, I'd like to share a design proposal[2] that attempts to address the > > problem described in the ticket[1]. > > > > Thanks in advance for your time and feedback. > > Looking forward to the discussion! > > > > > > [1]https://issues.apache.org/jira/browse/FLINK-33123 > > [2] > > > > > https://docs.google.com/document/d/1e_6o4bdXcKtFL3xYxKeyKnRjR8ffsw6Z8frp3tp7u-M/edit?usp=sharing > > > > Best regards, > > Yuepeng Pan > > >
