RNHTTR opened a new pull request, #30375: URL: https://github.com/apache/airflow/pull/30375
I accidentally closed #30108, so this is basically reopening that PR. Some updates: * There's not really a mechanism to deprecate multiple configs into one. The nature of this change requires some unique deprecation logic which is implemented in `scheduler_job.py`. * This deprecates `celery.stalled_task_timeout`, `kubernetes.worker_pods_pending_timeout`, and `celery.task_adoption_timeout` closes: #28120 closes: #21225 closes: #28943 Tasks occasionally get stuck in queued and aren't resolved by `stalled_task_timeout` (#28120). This PR moves the logic for handling stalled tasks to the scheduler and simplifies the logic by marking any task that has been queued for more than `scheduler.task_queued_timeout` as failed, allowing it to be retried if the task has available retries. This doesn't require an additional scheduler nor allow for the possibility of tasks to get stuck in an infinite loop of scheduled -> queued -> scheduled ... -> queued as exists in #28943. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
