[ 
https://issues.apache.org/jira/browse/FLINK-35285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868711#comment-17868711
 ] 

Trystan edited comment on FLINK-35285 at 7/25/24 3:19 PM:
----------------------------------------------------------

Thank you both for explaining the nuances of this - it definitely is a 
complicated system!

Just for my own understanding, I think I am missing one piece of context that 
might help me understand the problem better. What I am proposing is simply that 
- in the case of scale down - we allow it to scale down. The target is the 
target, no? I think I am missing how that would be considered undershooting, 
since in almost all cases it would still find the ideal parallelism even with 
the keygroup optimization. It's just this one edge case of being stuck due to 
small scaledown % + larger gaps between ideal keygroup divisor numbers.

Is the concern that by not forcing keygroup optimization on scaledown, we may 
wind up with one TM taking on a disproportionate share of the load, rendering 
the calculation invalid?

(for context, even without the autoscaler, we very rarely pay attention to 
keygroup optimization, and instead tend to focus on source partition balancing. 
our data is rarely perfectly evenly keyed, and so we will always naturally have 
shifting keygroup skew from minute-to-minute anyway. it hasn't usually been a 
source of pain for us so far!)


was (Author: trystan):
Thank you both for explaining the nuances of this - it definitely is a 
complicated system!

Just for my own understanding, I think I am missing one piece of context that 
might help me understand the problem better. What I am proposing is simply that 
- in the case of scale down - we allow it to scale down. The target is the 
target, no? I think I am missing how that would be considered undershooting, 
since in almost all cases it would still find the ideal parallelism even with 
the keygroup optimization. It's just this one edge case of being stuck due to 
small scaledown % + larger gaps between ideal keygroup divisor numbers.

Is the concern that by not forcing keygroup optimization on scaledown, we may 
wind up with one TM taking on a disproportionate share of the load, rendering 
the calculation invalid?

> Autoscaler key group optimization can interfere with scale-down.max-factor
> --------------------------------------------------------------------------
>
>                 Key: FLINK-35285
>                 URL: https://issues.apache.org/jira/browse/FLINK-35285
>             Project: Flink
>          Issue Type: Bug
>          Components: Kubernetes Operator
>            Reporter: Trystan
>            Priority: Minor
>
> When setting a less aggressive scale down limit, the key group optimization 
> can prevent a vertex from scaling down at all. It will hunt from target 
> upwards to maxParallelism/2, and will always find currentParallelism again.
>  
> A simple test trying to scale down from a parallelism of 60 with a 
> scale-down.max-factor of 0.2:
> {code:java}
> assertEquals(48, JobVertexScaler.scale(60, inputShipStrategies, 360, .8, 8, 
> 360)); {code}
>  
> It seems reasonable to make a good attempt to spread data across subtasks, 
> but not at the expense of total deadlock. The problem is that during scale 
> down it doesn't actually ensure that newParallelism will be < 
> currentParallelism. The only workaround is to set a scale down factor large 
> enough such that it finds the next lowest divisor of the maxParallelism.
>  
> Clunky, but something to ensure it can make at least some progress. There is 
> another test that now fails, but just to illustrate the point:
> {code:java}
> for (int p = newParallelism; p <= maxParallelism / 2 && p <= upperBound; p++) 
> {
>     if ((scaleFactor < 1 && p < currentParallelism) || (scaleFactor > 1 && p 
> > currentParallelism)) {
>         if (maxParallelism % p == 0) {
>             return p;
>         }
>     }
> } {code}
>  
> Perhaps this is by design and not a bug, but total failure to scale down in 
> order to keep optimized key groups does not seem ideal.
>  
> Key group optimization block:
> [https://github.com/apache/flink-kubernetes-operator/blob/fe3d24e4500d6fcaed55250ccc816546886fd1cf/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/JobVertexScaler.java#L296C1-L303C10]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to