[ 
https://issues.apache.org/jira/browse/FLINK-35285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868069#comment-17868069
 ] 

Gyula Fora commented on FLINK-35285:
------------------------------------

Hey [~trystan] sorry for the late reply. 
I see the problem...

Once you are close to the max parallelism you need higher scale down factor, 
but you generally want to keep this fairly conservative. 

One approach that we could also consider is that we always allow scaling 
up/down to the nearest valid parallelism regardless of the max scaling factors. 
I think this a bit risky and I am not too much in favour of it.

For your particular use case, it may be best to solve the issue by changing 
your max-parallelism setting from 120 to something like 720 or 1260. As long as 
your job parallelism is very small compared to the max parallelism and we have 
a lot of divisors the algorithm has a lot of flexibility even with small scale 
factors. 

In other words you could chose your max parallelism based on the desired scale 
factor. For small scale factors like 0.1 you will need a max parallelism that 
high compared to the usual parallelism. You can still set a vertex max 
parallelism in the autoscaler to avoid scaling very high in practice.

> Autoscaler key group optimization can interfere with scale-down.max-factor
> --------------------------------------------------------------------------
>
>                 Key: FLINK-35285
>                 URL: https://issues.apache.org/jira/browse/FLINK-35285
>             Project: Flink
>          Issue Type: Bug
>          Components: Kubernetes Operator
>            Reporter: Trystan
>            Priority: Minor
>
> When setting a less aggressive scale down limit, the key group optimization 
> can prevent a vertex from scaling down at all. It will hunt from target 
> upwards to maxParallelism/2, and will always find currentParallelism again.
>  
> A simple test trying to scale down from a parallelism of 60 with a 
> scale-down.max-factor of 0.2:
> {code:java}
> assertEquals(48, JobVertexScaler.scale(60, inputShipStrategies, 360, .8, 8, 
> 360)); {code}
>  
> It seems reasonable to make a good attempt to spread data across subtasks, 
> but not at the expense of total deadlock. The problem is that during scale 
> down it doesn't actually ensure that newParallelism will be < 
> currentParallelism. The only workaround is to set a scale down factor large 
> enough such that it finds the next lowest divisor of the maxParallelism.
>  
> Clunky, but something to ensure it can make at least some progress. There is 
> another test that now fails, but just to illustrate the point:
> {code:java}
> for (int p = newParallelism; p <= maxParallelism / 2 && p <= upperBound; p++) 
> {
>     if ((scaleFactor < 1 && p < currentParallelism) || (scaleFactor > 1 && p 
> > currentParallelism)) {
>         if (maxParallelism % p == 0) {
>             return p;
>         }
>     }
> } {code}
>  
> Perhaps this is by design and not a bug, but total failure to scale down in 
> order to keep optimized key groups does not seem ideal.
>  
> Key group optimization block:
> [https://github.com/apache/flink-kubernetes-operator/blob/fe3d24e4500d6fcaed55250ccc816546886fd1cf/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/JobVertexScaler.java#L296C1-L303C10]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to