jayesh-shaw-glean opened a new issue, #33096:
URL: https://github.com/apache/airflow/issues/33096

   ### Description
   
   Currently Airflow supports a number of trigger_rules all of which cater to 
various use cases. But notably all of the trigger rules currently depend only 
on direct parents. It would be nice to have trigger rules that let a task 
depend on all of its ancestors and not only direct parents.
   
   ### Use case/motivation
   
   For an example take the following dag:
   ```
   PRE_PROCESS >> A >>  B >> C
   ```
   
   This dag takes a list of tasks it needs to run and skips the tasks that do 
not need to run. This is achieved by utilising the PRE_PROCESS node, it takes 
the list of tasks that need to be run and then gets the list of all tasks in 
the dag, with which it constructs the list of tasks that do not need to run, 
and then leverages the `SkipMixin.skip()` function to achieve the goal. 
   
   The issue: Consider a scenario where we need to run only A and C. In this 
case, the PRE_PROCESS operator would skip B and the dag state would look like 
the following:
   
   ```
   PRE_PROCESS (success) >> A (queued) >> B (skipped) >> C (queued)
   ```
   
   In current set of trigger rules, none of them allow us to make sure that C 
runs only after A succeeds. C's trigger could only depend on B's status. For 
example, if the operator's have the `none_failed` trigger rule, the state of 
the dag would become:
   
   ```
   PRE_PROCESS (success) >> A (running) >> B (skipped) >> C (running)
   ```
   
   It would thus be nice to have a trigger_rule which caters the above use case.
   
   ### Current workaround:
   
   Our use-case involves a DAG with ~200 tasks and at times we might only need 
to run 2-4 operations (which may or may not depend on each other 
directly/indirectly). To solve the problem above, what we do is
      - Set the trigger_rule of all tasks to `none_failed`
      - Communicate the list of tasks that need to be run to all tasks via XCom
      - In a task determine if it needs to be run, if not raise 
`AirflowSkipException`
      
   The above solution works well, but the issue arises when we attempt to run 
on scale (say have 100-200 concurrent runs of the same dag, each running only 
2-4 operations). In such cases, every operation that needs to be skip consumes 
a worker slot for a little while and on scale, this slows the dag by a pretty 
good fraction and moreover this is technically overusage of resources.
   
   Happy to raise a PR for the same, but wanted to know if this problem has 
been discussed previously and since I am unfamiliar with Airflow's source code, 
would love to know the feasibility of implementing this feature from people who 
are familiar with the stack. 
   
   Thank you!
   
   
   
   
   
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to