[GitHub] [airflow] potiuk commented on issue #26587: Airflow tasks failing with SIGTERM when worker pod downscaling.

GitBox Thu, 22 Sep 2022 03:28:49 -0700


potiuk commented on issue #26587:
URL: https://github.com/apache/airflow/issues/26587#issuecomment-1254832076


   This is currently not possible and it is K8S limitation, not our problem. 
The only possible approach to avoid it is:
   
   1) use CeleryKubernetesExecutor
   2) assign all your long-running tasks to Kubernetes queue
   3) set gracefulTerminationPeriod to be longer than your longest possible 
running task tht you run via Celery Executor
   
   This approach will work in the way that workers being downscaled are put in 
offline state and have enough time to complete all tasks before they are killed.
   
   Longer explanation: Currently the "stock" Kubernetes does not allow to 
downscale selected Pod from ReplicaSet or Deployment - it will randomly pick 
one and there is no way to change it and for example kill the POD that should 
be killed. The K8S team is opposing to implement a solution despite a number of 
people trying to convince them. The latest attempt (which is actually 
originated by @thesuperzapper - largely because of his Airflow Helm Chart  - is 
here https://github.com/kubernetes/kubernetes/issues/107598 and is activelly 
discussed, but even if implemented, it will take multiple months to be released 
and new version of Kuberntes.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] potiuk commented on issue #26587: Airflow tasks failing with SIGTERM when worker pod downscaling.

Reply via email to