Re: [PR] [FLINK-33771] Add cluster capacity awareness to autoscaler [flink-kubernetes-operator]

via GitHub Tue, 09 Jan 2024 03:08:42 -0800


mxm commented on PR #751:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/751#issuecomment-1882871255


   > ## 1. Why does autoscaler need to aware the cluster capacity?
   
   The most pressing issue is to solve job starvation in a scenario where 
hundreds of pipelines scale up at the same time. A typical scenario when this 
happens is an outage after which all pipelines need to catch up again.
   
   >It only works for flink version before 1.18 or without adaptive scheduler, 
right? 
   
   It currently works regardless of the Flink version. Even with the resource 
requirements API, we wouldn't want to scale any pipelines unless their resource 
requirements can be met. Otherwise we risk rescaling too often or scaling to an 
undesirable parallelism configuration. That said, many users are still on Flink 
1.16 for the foreseeable time.
   
   > As I understand, the Adaptive Scheduler supports set lowerBound and 
upperBound for each task vertex. And autoscaler can set lowerBound=1 and 
upperBound = expectedParalleslism.
   
   That is true. Setting the lower bound to 1 is kind of tricky because it 
means that the job can rescale to a lower expected parallelism while giving up 
the current parallelism. Ideally, the autoscaler controls the full lifecycle of 
the parallelism configuration.
   
   > When kubernetes or yarn have enough resources, flink job will run task 
with expectedParalleslism When the available resource < expectedParalleslism, 
flink job will run task with available resource.
   
   We may not want that though. We want to control the scaling because we have 
certain SLOs that we want to guarantee when we scale.
   
   > ## 2. Is there any problems when multiple jobs are scaling at the same 
time?
   
   The relevant code paths are thread-safe. Only one pipeline can run through 
the capacity check at a time. The actual check is very quick.
   
   > Assuming we have 2 jobs: jobA and jobB. The available resource is 10 CPU 
before their scaling.
   > 
   > The JobA needs 10 CPU for this scaling, and the JobB needs 10 CPU for this 
scaling. During the resource checking phase, both of them think resource is 
enough. But during start job, the resource isn't enough for them. Because they 
need 20 CPUs.
   
   That cannot happen because whichever pipeline scales first will reserve the 
available resources and they won't be available afterwards until they are freed 
again. There is a reservation system to ensure this.
   
   > Or is it possible that other services outside of 
`flink-kubernetes-operator` are also submitting jobs to the same resource pool? 
If yes, there will be similar problems.
   
   Yes, that can happen. It is a heuristic. New jobs are rarely started and 
don't typically coincident with an outage scenario. But we will detect the new 
resource usage as soon as we refresh the resource view at a configurable 
interval. We could also do it on every rescaling but I wanted to limit the 
number of API calls.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [FLINK-33771] Add cluster capacity awareness to autoscaler [flink-kubernetes-operator]

Reply via email to