Thanks Yuepeng for the proposal. Overall LGTM. However, I'm a bit concerned 
about the potential performance impact of changing a forward edge to rebalance. 
The autoscaler currently assumes a linear performance model between the 
throughput and the parallelism. The edge change can easily break this 
assumption as Rebalance introduces more shuffle and results in higher CPU usage 
and network memory consumption. I suggest considering it on the algorithm side 
as well.

Best,
Zhanghao Chen
________________________________
From: Yuepeng Pan <[email protected]>
Sent: Tuesday, January 13, 2026 23:46
To: [email protected] <[email protected]>
Subject: [DISCUSS] A design proposal to fix the wrong dynamic replacement of 
partitioner from FORWARD to REBLANCE for AutoScaler and AdaptiveScheduler

Hi community,

I would like to start a discussion around the issue described in
**FLINK-33123[1]**.

This issue can mainly be broken down into two parts:
a).
Assuming that initially two upstream and downstream JobVertices connected
by a FORWARD edge have the same parallelism,
due to a rescale operation their parallelism becomes different.
In this case, the current strategy may produce incorrect results when
rebuilding the upstream–downstream network partition connections.
b).
Assuming that the parallelism of two upstream and downstream JobVertices is
 different,
but due to a rescale operation their parallelism needs to be adjusted to be
the same.
In this scenario, it is not possible to determine the partition type after
the rescale.

So, I'd like to share a design proposal[2] that attempts to address the
problem described in the ticket[1].

Thanks in advance for your time and feedback.
Looking forward to the discussion!


[1]https://issues.apache.org/jira/browse/FLINK-33123
[2]
https://docs.google.com/document/d/1e_6o4bdXcKtFL3xYxKeyKnRjR8ffsw6Z8frp3tp7u-M/edit?usp=sharing

Best regards,
Yuepeng Pan

Reply via email to