arkadiusz-bach commented on issue #21225:
URL: https://github.com/apache/airflow/issues/21225#issuecomment-1230535103

   @V0lantis task might stuck in the queued state at least due to:
   - Redis crashed after receiving the task and it had no time / was not 
configured to save its state to disk, so the task was lost, but scheduler 
thinks that task is waiting to be picked up workers
   - terminationGracePeriodSeconds  on your worker PODs is too low or it is not 
there at all.(default is 60 seconds)
    
   This message `worker: Warm shutdown` means that celery received SIGTERM 
signal and it started gracefull shutdown - it is not going to pick any more 
tasks from redis queue and it will wait for all of the running tasks to finish
   
   But if you've got some tasks that may be running for longer than 
terminationGracePeriod then Kubernetes might send SIGKILL first and:
   - Celery will not be able to wait for all of the running tasks to 
finish(those will end with failed status)
   - it may be able to pick the task from queue, but not able to change it 
state to running(maybe your case)
   
   Also some of the celery workers might receive SIGKILL signal, when there is 
not enough memory allocated and it may led to the same behaviour, unfortunately 
you may not see OOM events in the kubernetes cluster when it happens, becaue 
when there is more than one process running on the container in Kubernetes then 
it is chosing randomly one of the child processes within container and sends 
SIGKILL(Celery is running with Main process and child processes(workers)).
   
   If the Main process receives SIGKILL you will probably see OOM event, but if 
child then tasks it was processing will fail(in the logs you will be able to 
see that it received SIGKILL singal) or stuck in queued state if it was able to 
pick it 
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to