SamWheating commented on issue #16023: URL: https://github.com/apache/airflow/issues/16023#issuecomment-856087959
We were experiencing a similar issue, but it turned out to be due to the SchedulerJob being marked as failed due to a slow heartbeat (due to adding a few thousand DAGs). Can you look for this line in your scheduler logs? `Marked %d SchedulerJob instances as failed` In our case, the solution was to increase the value of `scheduler.scheduler_health_check_threshold` in the config, which then prevented the schedulerJob from being killed. Here's the part in the source code that deals with resetting tasks with missing SchedulerJobs: https://github.com/apache/airflow/blob/9c94b72d440b18a9e42123d20d48b951712038f9/airflow/jobs/scheduler_job.py#L1803 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
