[jira] [Commented] (FLINK-30680) Consider using the autoscaler to detect slow taskmanagers

Shammon (Jira) Wed, 18 Jan 2023 21:46:25 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-30680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17678496#comment-17678496
 ]


Shammon commented on FLINK-30680:
---------------------------------

Thanks [~gyfora] to create this issue. In fact out team in bytedance has 
developed similar function in our flink cluster, we are trying to apply it in 
production. Our test results show that it has very good effect on slow nodes of 
streaming process.

As for the difference between 'detect slow tm' and restart the job and the 
overall proposal, [~wangm92] and [~Zhanghao Chen] can give more input

> Consider using the autoscaler to detect slow taskmanagers
> ---------------------------------------------------------
>
>                 Key: FLINK-30680
>                 URL: https://issues.apache.org/jira/browse/FLINK-30680
>             Project: Flink
>          Issue Type: New Feature
>          Components: Autoscaler, Kubernetes Operator
>            Reporter: Gyula Fora
>            Priority: Major
>
> We could leverage logic in the autoscaler to detect slow taskmanagers by 
> comparing the per-record processing times between them.
> If we notice that all subtasks on a single TM are considerably slower than 
> the rest (at similar input rates) we should try simply restarting the job 
> instead of scaling it up.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-30680) Consider using the autoscaler to detect slow taskmanagers

Reply via email to