SamWheating commented on a change in pull request #22410:
URL: https://github.com/apache/airflow/pull/22410#discussion_r836651441
##########
File path: airflow/api/common/mark_tasks.py
##########
@@ -468,7 +468,22 @@ def set_dag_run_state_to_failed(
task.dag = dag
tasks.append(task)
- return set_state(tasks=tasks, run_id=run_id, state=State.FAILED,
commit=commit, session=session)
+ # Mark non-finished tasks as SKIPPED.
+ task_ids = [task.task_id for task in dag.tasks]
+ tis = session.query(TaskInstance).filter(
+ TaskInstance.dag_id == dag.dag_id,
+ TaskInstance.run_id == run_id,
+ TaskInstance.task_id.in_(task_ids),
+ TaskInstance.state.not_in(State.finished),
+ TaskInstance.state.not_in(State.running),
+ )
Review comment:
```suggestion
tis = session.query(TaskInstance).filter(
TaskInstance.dag_id == dag.dag_id,
TaskInstance.run_id == run_id,
TaskInstance.state.not_in(State.finished),
TaskInstance.state.not_in(State.running),
)
```
Is the filter on task_id necessary? I'm wondering if its redundant since
we're already filtering on `dag_id` and `run_id`, which should just return all
of the tasks from that DagRun?
I think that there's also some weird race conditions here around changes to
the DAG while its in-flight, such that `dag.tasks` might not be completely
in-sync with the taskInstances which exist in the DB.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]