uranusjr commented on a change in pull request #20349:
URL: https://github.com/apache/airflow/pull/20349#discussion_r783616795



##########
File path: airflow/jobs/scheduler_job.py
##########
@@ -403,6 +403,15 @@ def _executable_task_instances_to_queued(self, max_tis: 
int, session: Session =
                     # Many dags don't have a task_concurrency, so where we can 
avoid loading the full
                     # serialized DAG the better.
                     serialized_dag = self.dagbag.get_dag(dag_id, 
session=session)
+                    # If the dag is missing, fail the task and continue to the 
next task.
+                    if not serialized_dag:
+                        self.log.error(
+                            "DAG '%s' for task instance %s not found in 
serialized_dag table",
+                            dag_id,
+                            task_instance,
+                        )
+                        task_instance.set_state(State.FAILED, session=session)

Review comment:
       > there could be downstream tasks which still attempt to execute, which 
will then be marked failed by the same check
   
   I think this is a good thing in this case. This code is reached because the 
DAG declaring those tasks is gone, so it doesn’t make sense to execute those 
tasks IMO.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to