potiuk commented on issue #26587: URL: https://github.com/apache/airflow/issues/26587#issuecomment-1254832076
This is currently not possible and it is K8S limitation, not our problem. The only possible approach to avoid it is: 1) use CeleryKubernetesExecutor 2) assign all your long-running tasks to Kubernetes queue 3) set gracefulTerminationPeriod to be longer than your longest possible running task tht you run via Celery Executor This approach will work in the way that workers being downscaled are put in offline state and have enough time to complete all tasks before they are killed. Longer explanation: Currently the "stock" Kubernetes does not allow to downscale selected Pod from ReplicaSet or Deployment - it will randomly pick one and there is no way to change it and for example kill the POD that should be killed. The K8S team is opposing to implement a solution despite a number of people trying to convince them. The latest attempt (which is actually originated by @thesuperzapper - largely because of his Airflow Helm Chart - is here https://github.com/kubernetes/kubernetes/issues/107598 and is activelly discussed, but even if implemented, it will take multiple months to be released and new version of Kuberntes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
