[jira] [Updated] (FLINK-36734) Potential improvement to autoscaler algorithm

Sai Sharath Dandi (Jira) Fri, 15 Nov 2024 20:52:05 -0800


     [ 
https://issues.apache.org/jira/browse/FLINK-36734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sai Sharath Dandi updated FLINK-36734:
--------------------------------------
    Description: Currently, the autoscaler algorithm tries to keep all the Job 
vertices at X% utilization(default 70) measured by the busytime metrics. 
However, it is impossible to keep all the vertices at 70% utilization depending 
on the Job topology (Imagine a topology with > 10 vertices). The autoscaler 
algorithm should be smart enough to set a better default utilization target 
depending on the number of vertices. 0.7 * 100 * TM_CPU / vertex count could be 
a better starting point than current default value. We may even consider to 
allow different utilization target per vertex and come up with a better default 
utilization target per vertex  (was: Currently, the autoscaler algorithm tries 
to keep all the Job vertices at X% utilization(default 70) measured by the 
busytime metrics. However, it is impossible to keep all the vertices at 70% 
utilization depending on the Job topology (Imagine a long job topology > 10 
vertices). The autoscaler algorithm should be smart enough to set a better 
default utilization target depending on the topology length. 0.7 * 100 * TM_CPU 
/ topology_length could be a better starting point than current default value. 
We may even consider to allow different utilization target per vertex and come 
up with a better default utilization target per vertex)

> Potential improvement to autoscaler algorithm
> ---------------------------------------------
>
>                 Key: FLINK-36734
>                 URL: https://issues.apache.org/jira/browse/FLINK-36734
>             Project: Flink
>          Issue Type: Improvement
>          Components: Autoscaler
>            Reporter: Sai Sharath Dandi
>            Priority: Minor
>
> Currently, the autoscaler algorithm tries to keep all the Job vertices at X% 
> utilization(default 70) measured by the busytime metrics. However, it is 
> impossible to keep all the vertices at 70% utilization depending on the Job 
> topology (Imagine a topology with > 10 vertices). The autoscaler algorithm 
> should be smart enough to set a better default utilization target depending 
> on the number of vertices. 0.7 * 100 * TM_CPU / vertex count could be a 
> better starting point than current default value. We may even consider to 
> allow different utilization target per vertex and come up with a better 
> default utilization target per vertex



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-36734) Potential improvement to autoscaler algorithm

Reply via email to