SamWheating commented on issue #16023:
URL: https://github.com/apache/airflow/issues/16023#issuecomment-856087959


   We were experiencing a similar issue, but it turned out to be due to the 
SchedulerJob being marked as failed due to a slow heartbeat (due to adding a 
few thousand DAGs).
   
   Can you look for this line in your scheduler logs?
   `Marked %d SchedulerJob instances as failed`
   
   In our case, the solution was to increase the value of 
`scheduler.scheduler_health_check_threshold` in the config, which then 
prevented the schedulerJob from being killed. 
   
   Here's the part in the source code that deals with resetting tasks with 
missing SchedulerJobs:
   
https://github.com/apache/airflow/blob/9c94b72d440b18a9e42123d20d48b951712038f9/airflow/jobs/scheduler_job.py#L1803


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to