lukas-at-harren edited a comment on issue #13542: URL: https://github.com/apache/airflow/issues/13542#issuecomment-819411427
@ephraimbuddy I found the root cause for _my_ problem, and a way to reproduce it. Keep in mind my stack (Airflow + KubernetesExecutor), as this issue has been watered down by many different stacks and situations, ending with the same symptoms. Steps to reproduce: * Create a DAG and schedule some work for it. * While work is scheduled, remove the DAG. * Restart the scheduler. * Now the DAG does no longer exist, but it still exists in the database. And its scheduled tasks also still exist. * The scheduler dutifully schedules work for the non-existent DAG (<- this is a problem) * The KubernetesExecutor spawns a new worker pod * The worker pod is awfully surprised that there is no DAG for the work he was tasked with * The worker pod commits suicide without telling anybody (<- this is a problem) * The scheduler faithfully keeps the task in "queued" state, although the worker is no more Solution: * The scheduler should not schedule work for tasks that are no longer in the DagBag * The worker must fail properly (with its task ending in a "failed" state) when he cannot find the DAG + task he was tasked with -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
