Thanks Yuepeng for the proposal. Overall LGTM. However, I'm a bit concerned about the potential performance impact of changing a forward edge to rebalance. The autoscaler currently assumes a linear performance model between the throughput and the parallelism. The edge change can easily break this assumption as Rebalance introduces more shuffle and results in higher CPU usage and network memory consumption. I suggest considering it on the algorithm side as well.
Best, Zhanghao Chen ________________________________ From: Yuepeng Pan <[email protected]> Sent: Tuesday, January 13, 2026 23:46 To: [email protected] <[email protected]> Subject: [DISCUSS] A design proposal to fix the wrong dynamic replacement of partitioner from FORWARD to REBLANCE for AutoScaler and AdaptiveScheduler Hi community, I would like to start a discussion around the issue described in **FLINK-33123[1]**. This issue can mainly be broken down into two parts: a). Assuming that initially two upstream and downstream JobVertices connected by a FORWARD edge have the same parallelism, due to a rescale operation their parallelism becomes different. In this case, the current strategy may produce incorrect results when rebuilding the upstream–downstream network partition connections. b). Assuming that the parallelism of two upstream and downstream JobVertices is different, but due to a rescale operation their parallelism needs to be adjusted to be the same. In this scenario, it is not possible to determine the partition type after the rescale. So, I'd like to share a design proposal[2] that attempts to address the problem described in the ticket[1]. Thanks in advance for your time and feedback. Looking forward to the discussion! [1]https://issues.apache.org/jira/browse/FLINK-33123 [2] https://docs.google.com/document/d/1e_6o4bdXcKtFL3xYxKeyKnRjR8ffsw6Z8frp3tp7u-M/edit?usp=sharing Best regards, Yuepeng Pan
