[
https://issues.apache.org/jira/browse/FLINK-35823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865280#comment-17865280
]
yuanfenghu commented on FLINK-35823:
------------------------------------
I have discussed this issue with [~fanrui] . I wonder if other people in the
community have any suggestions on this?
> Introduce parameters to control the upper limit of rescale to avoid unlimited
> expansion due to server-side bottlenecks or data skew.
> ------------------------------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-35823
> URL: https://issues.apache.org/jira/browse/FLINK-35823
> Project: Flink
> Issue Type: Improvement
> Components: Autoscaler
> Reporter: yuanfenghu
> Priority: Major
> Fix For: 2.0.0
>
>
> 1. If a Flink application writes data to other external storage systems, such
> as HDFS, Kafka, etc., when the external server becomes the bottleneck of the
> entire task, such as the throughput of HDFS decreases, the writing IO time
> will increase, and the corresponding Flink The metric busy will also
> increase. At this time, the autoscaler will determine that the parallelism
> needs to be increased to increase the write rate. However, in the above case,
> due to the bottleneck of the external server, this will not work. This will
> cause the next determination cycle to continue to increase the parallelism
> until parallelism = max-parallelism.
> 2. If some tasks have data skew, it will also cause the same problem.
>
> Therefore, we should introduce a new parameter judgment. If the degree of
> parallelism continues to increase, the throughput will basically remain the
> same. There is no need to expand anymore.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)