dstandish commented on PR #29406: URL: https://github.com/apache/airflow/pull/29406#issuecomment-1658842100
> One suggestion of how to approach this was to modify base_executor to kill all tasks within a DAG once one task fails. The problem that I found within this approach was that, first of all there are several different executors which all store which tasks are running, queued, or completed differently, making it hard to make a unified function that kills all running tasks on all types of executors. More importantly, through my tests I have found that the executor often doesn't have full information of all the tasks in a dag run when one task fails. What this means is that, if a task fails, the executor may not yet have full information about all the tasks currently running within that DAG, and all tasks currently queued, and thus cannot properly fail the DAG. Yeah it does seem that the current approach does allow for races between scheduler and task. What if tis are expanded after `tis = self.get_dagrun(session).get_task_instances()` is called and before anything is killed? Separately, what if here we skip a TI and the scheduler sets it to queued? I'm not sure the frequency with which this kind of thing would manifest but the conditions aren't hard to imagine. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
