SamWheating commented on a change in pull request #22410:
URL: https://github.com/apache/airflow/pull/22410#discussion_r836651441



##########
File path: airflow/api/common/mark_tasks.py
##########
@@ -468,7 +468,22 @@ def set_dag_run_state_to_failed(
         task.dag = dag
         tasks.append(task)
 
-    return set_state(tasks=tasks, run_id=run_id, state=State.FAILED, 
commit=commit, session=session)
+    # Mark non-finished tasks as SKIPPED.
+    task_ids = [task.task_id for task in dag.tasks]
+    tis = session.query(TaskInstance).filter(
+        TaskInstance.dag_id == dag.dag_id,
+        TaskInstance.run_id == run_id,
+        TaskInstance.task_id.in_(task_ids),
+        TaskInstance.state.not_in(State.finished),
+        TaskInstance.state.not_in(State.running),
+    )

Review comment:
       ```suggestion
       tis = session.query(TaskInstance).filter(
           TaskInstance.dag_id == dag.dag_id,
           TaskInstance.run_id == run_id,
           TaskInstance.state.not_in(State.finished),
           TaskInstance.state.not_in(State.running),
       )
   ```
   
   Is the filter on task_id necessary? I'm wondering if its redundant since 
we're already filtering on `dag_id` and `run_id`, which should just return all 
of the tasks from that DagRun?
   
   I think that there's also some weird race conditions here around changes to 
the DAG while its in-flight, such that `dag.tasks` might not be completely 
in-sync with the taskInstances which exist in the DB. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to