getaaron opened a new issue, #32986:
URL: https://github.com/apache/airflow/issues/32986

   ### Apache Airflow version
   
   2.6.3
   
   ### What happened
   
   Due to an error with our postgres database (the stats on the tables were 
stale) this query:
   
   
https://github.com/apache/airflow/blob/1e20ef215ab8e688dc4331513fc5df34db443e84/airflow/jobs/scheduler_job_runner.py#L1686-L1698
   
   took a very long time to return. During this time, heartbeats were not 
written, which caused health check failures (including k8s start / liveness 
check failures).
   
   It took several days of debugging to track down the cause because airflow 
does not log any errors in this case.
   
   ### What you think should happen instead
   
   Airflow should log warnings/errors if queries that are expected to return 
quickly take a long time to return.
   
   ### How to reproduce
   
   1. Make your postgres database slow (either don't analyze statistics, or 
just change the query to something like `SELECT pg_sleep(2400);` for testing)
   2. Try to run the airflow scheduler
   3. Notice that heartbeats are not written frequently and no warnings or 
errors are logged
   
   ### Operating System
   
   Debian GNU/Linux 11 (bullseye)
   
   ### Versions of Apache Airflow Providers
   
   n/a
   
   ### Deployment
   
   Docker-Compose
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to