mxm opened a new pull request, #751:
URL: https://github.com/apache/flink-kubernetes-operator/pull/751

   To avoid starvation of pipelines when the Kubernetes cluster runs out of 
resources, new scaling attempts should be stopped when the no more additional 
pods can be scheduled for rescaling.
   
   While Flink's ResourceRequirement API can prevent some of these cases, it 
requires using Flink 1.18 and an entirely different Flink scheduler. Extensive 
testing still has to be done with the new scheduler and the rescaling behavior. 
We woud hand off control over the rescale time to Flink which uses various 
parameters to control the exact scaling behavior.
   
   For the config-based parallelism overrides, we have pretty good heuristics 
in the operator to check in Kubernetes for the approximate number of free 
cluster resources, the max cluster scaleup for the Cluster Autoscaler, and the 
required scaling costs. Having cluster resource information will also allow to 
implement fairness between all the autoscaled pipelines.
   
   This PR adds ClusterResourceManager which which provides a view over the 
allocatable resources within a Kubernetes cluster and allows to simulate 
scheduling pods with a defined number of required resources.
   
   The goal is to provide a good indicator for whether resources needed for 
autoscaling are going to be available. This is achieved by pulling the node 
resource usage from the Kubernetes cluster at a regular configurable interval, 
after which we use this data to simulate adding / removing resources (pods). 
Note that this is merely a (pretty good) heuristic because the Kubernetes 
scheduler has the final saying. However, we prevent 99% of the scenarios after 
pipeline outages which can lead to massive scale up where all pipelines may be 
scaled up at the same time and exhaust the number of available resources.
   
   The simulation can run on a fixed set of Kubernetes nodes. Additionally, if 
we detect that the cluster is using the Kubernetes Cluster Autoscaler, we will 
use this data to extrapolate the number of nodes to the maximum defined nodes 
in the autoscaler configuration.  We currently track CPU and memory. Ephemeral 
storage is missing because there is no easy way to get node statics on free 
storage.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to