ephraimbuddy commented on a change in pull request #20349:
URL: https://github.com/apache/airflow/pull/20349#discussion_r783670984



##########
File path: airflow/jobs/scheduler_job.py
##########
@@ -403,6 +403,15 @@ def _executable_task_instances_to_queued(self, max_tis: 
int, session: Session =
                     # Many dags don't have a task_concurrency, so where we can 
avoid loading the full
                     # serialized DAG the better.
                     serialized_dag = self.dagbag.get_dag(dag_id, 
session=session)
+                    # If the dag is missing, fail the task and continue to the 
next task.
+                    if not serialized_dag:
+                        self.log.error(
+                            "DAG '%s' for task instance %s not found in 
serialized_dag table",
+                            dag_id,
+                            task_instance,
+                        )
+                        task_instance.set_state(State.FAILED, session=session)

Review comment:
       I will opt for setting all scheduled tasks to None. I doubt that failing 
the DAG here will really fail it when there're tasks being executed in 
executor. There's a bug that when you mark a DAG as failed, it comes up again 
as running https://github.com/apache/airflow/issues/16078
   
   So I propose to set all scheduled tasks to None. The scheduler will no 
longer move the task instances to scheduled when the dag can no longer be found 
and it makes sense to set it to None instead of failing it since the task 
instances won't have logs




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to