yuanfenghu created FLINK-35823:
----------------------------------
Summary: Introduce parameters to control the upper limit of
rescale to avoid unlimited shrinkage due to server-side bottlenecks or data
skew.
Key: FLINK-35823
URL: https://issues.apache.org/jira/browse/FLINK-35823
Project: Flink
Issue Type: Improvement
Components: Autoscaler
Reporter: yuanfenghu
Fix For: 2.0.0
1. If a Flink application writes data to other external storage systems, such
as HDFS, Kafka, etc., when the external server becomes the bottleneck of the
entire task, such as the throughput of HDFS decreases, the writing IO time will
increase, and the corresponding Flink The metric busy will also increase. At
this time, the autoscaler will determine that the parallelism needs to be
increased to increase the write rate. However, in the above case, due to the
bottleneck of the external server, this will not work. This will cause the next
determination cycle to continue to increase the parallelism until parallelism =
max-parallelism.
2. If some tasks have data skew, it will also cause the same problem.
Therefore, we should introduce a new parameter judgment. If the degree of
parallelism continues to increase, the throughput will basically remain the
same. There is no need to expand anymore.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)