[ 
https://issues.apache.org/jira/browse/FLINK-35285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17843254#comment-17843254
 ] 

Gyula Fora commented on FLINK-35285:
------------------------------------

I think you make a good point here but we have to be a bit careful in terms of 
how much key group skew we allow while scaling up or down.

When we are scaling down to a parallelism which doesn't result in an even key 
distribution then our computed expected throughput will be off (because that is 
based on the assumption that throughput is linearly dependent on the 
parallelism but that assumes even key group distribution -> no data skew 
introduced by the scaling itself).

However this is something we can actually calculate by looking at how uneven 
the key group distribition is. As long as the introduced skew is within the 
flexible target rate boundaries then we should be able to scale down (without 
expecting a "rebound"). 

> Autoscaler key group optimization can interfere with scale-down.max-factor
> --------------------------------------------------------------------------
>
>                 Key: FLINK-35285
>                 URL: https://issues.apache.org/jira/browse/FLINK-35285
>             Project: Flink
>          Issue Type: Bug
>          Components: Kubernetes Operator
>            Reporter: Trystan
>            Priority: Minor
>
> When setting a less aggressive scale down limit, the key group optimization 
> can prevent a vertex from scaling down at all. It will hunt from target 
> upwards to maxParallelism/2, and will always find currentParallelism again.
>  
> A simple test trying to scale down from a parallelism of 60 with a 
> scale-down.max-factor of 0.2:
> {code:java}
> assertEquals(48, JobVertexScaler.scale(60, inputShipStrategies, 360, .8, 8, 
> 360)); {code}
>  
> It seems reasonable to make a good attempt to spread data across subtasks, 
> but not at the expense of total deadlock. The problem is that during scale 
> down it doesn't actually ensure that newParallelism will be < 
> currentParallelism. The only workaround is to set a scale down factor large 
> enough such that it finds the next lowest divisor of the maxParallelism.
>  
> Clunky, but something to ensure it can make at least some progress. There is 
> another test that now fails, but just to illustrate the point:
> {code:java}
> for (int p = newParallelism; p <= maxParallelism / 2 && p <= upperBound; p++) 
> {
>     if ((scaleFactor < 1 && p < currentParallelism) || (scaleFactor > 1 && p 
> > currentParallelism)) {
>         if (maxParallelism % p == 0) {
>             return p;
>         }
>     }
> } {code}
>  
> Perhaps this is by design and not a bug, but total failure to scale down in 
> order to keep optimized key groups does not seem ideal.
>  
> Key group optimization block:
> [https://github.com/apache/flink-kubernetes-operator/blob/fe3d24e4500d6fcaed55250ccc816546886fd1cf/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/JobVertexScaler.java#L296C1-L303C10]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to