potiuk commented on issue #22350: URL: https://github.com/apache/airflow/issues/22350#issuecomment-1073945325
It's an interesting one. I think zombies should be restrted when detected. I think you can fine-tune your celery/airflow behaviour by https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#scheduler-zombie-task-threshold and [timeout]https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#celery-config-options to configure the hard limit on waiting for tasks when celery worker attempts to shut down. Airlfow SHOULD eventually catch up if you wait long enough that all the timeuts passed, but the best way is to configure your timeouts so that that the termination is not forced by K8S - i.e. K8S grace timeout > Celery worker task timeout. Then Celery should have enough time to mark the tasks as failed so that they can be retried by scheduler much faster. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
