mxm commented on PR #751: URL: https://github.com/apache/flink-kubernetes-operator/pull/751#issuecomment-1882871255
> ## 1. Why does autoscaler need to aware the cluster capacity? The most pressing issue is to solve job starvation in a scenario where hundreds of pipelines scale up at the same time. A typical scenario when this happens is an outage after which all pipelines need to catch up again. >It only works for flink version before 1.18 or without adaptive scheduler, right? It currently works regardless of the Flink version. Even with the resource requirements API, we wouldn't want to scale any pipelines unless their resource requirements can be met. Otherwise we risk rescaling too often or scaling to an undesirable parallelism configuration. That said, many users are still on Flink 1.16 for the foreseeable time. > As I understand, the Adaptive Scheduler supports set lowerBound and upperBound for each task vertex. And autoscaler can set lowerBound=1 and upperBound = expectedParalleslism. That is true. Setting the lower bound to 1 is kind of tricky because it means that the job can rescale to a lower expected parallelism while giving up the current parallelism. Ideally, the autoscaler controls the full lifecycle of the parallelism configuration. > When kubernetes or yarn have enough resources, flink job will run task with expectedParalleslism When the available resource < expectedParalleslism, flink job will run task with available resource. We may not want that though. We want to control the scaling because we have certain SLOs that we want to guarantee when we scale. > ## 2. Is there any problems when multiple jobs are scaling at the same time? The relevant code paths are thread-safe. Only one pipeline can run through the capacity check at a time. The actual check is very quick. > Assuming we have 2 jobs: jobA and jobB. The available resource is 10 CPU before their scaling. > > The JobA needs 10 CPU for this scaling, and the JobB needs 10 CPU for this scaling. During the resource checking phase, both of them think resource is enough. But during start job, the resource isn't enough for them. Because they need 20 CPUs. That cannot happen because whichever pipeline scales first will reserve the available resources and they won't be available afterwards until they are freed again. There is a reservation system to ensure this. > Or is it possible that other services outside of `flink-kubernetes-operator` are also submitting jobs to the same resource pool? If yes, there will be similar problems. Yes, that can happen. It is a heuristic. New jobs are rarely started and don't typically coincident with an outage scenario. But we will detect the new resource usage as soon as we refresh the resource view at a configurable interval. We could also do it on every rescaling but I wanted to limit the number of API calls. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
