lukas-at-harren commented on issue #13542:
URL: https://github.com/apache/airflow/issues/13542#issuecomment-819411427


   @ephraimbuddy I found the root cause for the problem, and a way to reproduce 
it. Keep in mind my stack (Airflow + KubernetesExecutor), as this issue has 
been watered down by many different stacks and situations, ending with the same 
symptoms.
   
   Steps to reproduce:
   
   * Create a DAG and schedule some work for it.
   * While work is scheduled, remove the DAG.
   * Restart the scheduler.
   * Now the DAG does no longer exist, but it still exists in the database. And 
its scheduled tasks also still exist.
   * The scheduler dutifully schedules work for the non-existent DAG
   * The KubernetesExecutor spawns a new worker pod
   * The worker pod is awfully surprised that there is no DAG for the work he 
was tasked with
   * The worker pod commits suicide without telling anybody
   * The scheduler faithfully keeps the task in "queued" state, although the 
worker is no more
   
   Solution:
   
   * The scheduler should not schedule work for tasks that are no longer in the 
DagBag
   * The worker must fail properly (with its task ending in a "failed" state) 
when he cannot find the DAG + task he was tasked with


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to