hterik commented on issue #27100:
URL: https://github.com/apache/airflow/issues/27100#issuecomment-1290021256

   I've seen tasks getting stuck silently inside the `airflow db check` 
command, which is part of the **Entrypoint** of the airflow docker container. 
It has a loop both in the entrypoint itself, CONNECTION_CHECK_MAX_COUNT, set to 
20, that get multiplied with your connect timeout which can be very long by 
default, maybe even infinite? I've seen examples where it get stuck hanging 
here for hours even after the DB is recovered.
   
   If you use KubernetesExecutor, this will be the first thing happening 
whenever a task is started. It doesn't log anything before starting and 
immediately goes into probing the database for a very very long time.
   
   See https://github.com/apache/airflow/blob/main/Dockerfile#L952 
   
   ----------------
   
   Another problem with the scheduler is that if one of the threads inside 
crash, the process still keeps running. You need to monitor the scheduler 
heartbeat from externally and restart the scheduler whenever it becomes 
unhealthy. This became a lot easier in 2.4 which now has a dedicated 
health-probe for scheduler. If this is the problem, it should be visible with a 
banner on the top of the web page.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to