lukas-at-harren edited a comment on issue #13542:
URL: https://github.com/apache/airflow/issues/13542#issuecomment-819411427


   @ephraimbuddy I found the root cause for _my_ problem, and a way to 
reproduce it. Keep in mind my stack (Airflow + KubernetesExecutor), as this 
issue has been watered down by many different stacks and situations, ending 
with the same symptoms.
   
   Steps to reproduce:
   
   * Create a DAG and schedule some work for it.
   * While work is scheduled, remove the DAG.
   * Restart the scheduler.
   * Now the DAG does no longer exist, but it still exists in the database. And 
its scheduled tasks also still exist.
   * The scheduler dutifully schedules work for the non-existent DAG (<- this 
is a problem)
   * The KubernetesExecutor spawns a new worker pod
   * The worker pod is awfully surprised that there is no DAG for the work he 
was tasked with
   * The worker pod commits suicide without telling anybody (<- this is a 
problem)
   * The scheduler faithfully keeps the task in "queued" state, although the 
worker is no more
   
   Solution:
   
   * The scheduler should not schedule work for tasks that are no longer in the 
DagBag
   * The worker must fail properly (with its task ending in a "failed" state) 
when he cannot find the DAG + task he was tasked with
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to