[GitHub] [airflow] dstandish commented on pull request #29406: Add fail stop feature for DAGs

via GitHub Mon, 31 Jul 2023 10:28:36 -0700


dstandish commented on PR #29406:
URL: https://github.com/apache/airflow/pull/29406#issuecomment-1658842100


   > One suggestion of how to approach this was to modify base_executor to kill 
all tasks within a DAG once one task fails. The problem that I found within 
this approach was that, first of all there are several different executors 
which all store which tasks are running, queued, or completed differently, 
making it hard to make a unified function that kills all running tasks on all 
types of executors. More importantly, through my tests I have found that the 
executor often doesn't have full information of all the tasks in a dag run when 
one task fails. What this means is that, if a task fails, the executor may not 
yet have full information about all the tasks currently running within that 
DAG, and all tasks currently queued, and thus cannot properly fail the DAG.
   
   Yeah it does seem that the current approach does allow for races between 
scheduler and task.
   
   What if tis are expanded after `tis = 
self.get_dagrun(session).get_task_instances()` is called and before anything is 
killed?
   
   Separately, what if here we skip a TI and the scheduler sets it to queued?
   
   I'm not sure the frequency with which this kind of thing would manifest but 
the conditions aren't hard to imagine.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] dstandish commented on pull request #29406: Add fail stop feature for DAGs

Reply via email to