[jira] [Updated] (FLINK-36734) Potential issue in autoscaler algorithm

Sai Sharath Dandi (Jira) Sat, 16 Nov 2024 05:16:17 -0800


     [ 
https://issues.apache.org/jira/browse/FLINK-36734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sai Sharath Dandi updated FLINK-36734:
--------------------------------------
    Description: Currently, the autoscaler algorithm tries to keep all the Job 
vertices at X% utilization(default 70) measured by the busytime metrics. 
However, it is impossible to keep all the vertices at 70% utilization depending 
on the Job topology (Imagine a long job topology > 10 vertices). The autoscaler 
algorithm should be smart enough to set a better default utilization target 
depending on the topology length. 0.7 * 100 * TM_CPU / topology_length could be 
a good starting point  (was: Currently, the autoscaler algorithm tries to keep 
all the Job vertices at X% utilization(default 70). However, it is impossible 
to keep all the vertices at 70% utilization depending on the Job topology 
(Imagine a long job topology > 10 vertices). The autoscaler algorithm should be 
smart enough to set a better default utilization target depending on the 
topology length. 0.7 * 100 * TM_CPU / topology_length could be a good starting 
point)

> Potential issue in autoscaler algorithm
> ---------------------------------------
>
>                 Key: FLINK-36734
>                 URL: https://issues.apache.org/jira/browse/FLINK-36734
>             Project: Flink
>          Issue Type: Improvement
>          Components: Autoscaler
>            Reporter: Sai Sharath Dandi
>            Priority: Minor
>
> Currently, the autoscaler algorithm tries to keep all the Job vertices at X% 
> utilization(default 70) measured by the busytime metrics. However, it is 
> impossible to keep all the vertices at 70% utilization depending on the Job 
> topology (Imagine a long job topology > 10 vertices). The autoscaler 
> algorithm should be smart enough to set a better default utilization target 
> depending on the topology length. 0.7 * 100 * TM_CPU / topology_length could 
> be a good starting point



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-36734) Potential issue in autoscaler algorithm

Reply via email to