nmehraein commented on issue #17507: URL: https://github.com/apache/airflow/issues/17507#issuecomment-922697603
Hello, the release 2.1.4 has been released yesterday with [#17333](https://github.com/apache/airflow/pull/17333) . Could you test with that release ? IMO is really related because all conditions and symptoms are similar (kill at first process heartbeat, and task is **not the first run** because is a DB clean bug between task run. Then more occur on sensor that naturally retry in mode reschedule, but can also occur on normal task when it failing in first attempt). I'm also trying in first fix attempt on our prod to increase scheduler heartbeat that attenuate the problem because the process has more time to start and update the pid in database before timeout, but can already occur when starting process is too long (especially under load), and i also see the scheduler warning "The scheduler does not appear to be running". -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
