This is a moved discussion from a GitHub Pull request (as Bolke suggested):
I'll try to copy paste the discussion as best I can: *PR*: https://github.com/apache/incubator-airflow/pull/1961 *Original commit message:* This commit fixes the premature finishing of DAGs when using the branching operator. The fix is making sure a operator in a DAG is marked as SKIPPED only when all the upstream operators are skipped. *Remark for Max:* I thought the desired behavior was to propagate the skipping so that the DAG run would end and allow the scheduler to move on. Ultimately your DAG will stop when concurrent number of DAG runs is reached. Whether the skipping state propagates or not, user action need to be taken for things to move forward. I think it's fair to think that people may expect either behavior. People may have designed existing DAGs based on how this feature behaves. I think the main reason I'd prefer the current behavior is that is allows the scheduler to move forward. In either case the branching section in the concepts part of the docs should be more clear about using the trigger rule one_success in the joining task. It should also be clear about how user should expect the skipped state to propagate. Thinking about it, the trigger rule all_success in a joining task is not logical. We could almost raise when/if detecting it. *Reply from Alex:* My understanding was that skipped == success, because when a skipped reaches the end the DAG is marked as success (even with half of my tasks unfinished). And actually this commit 2630361 <https://github.com/apache/incubator-airflow/commit/2630361ca24737c28f458825b20ab11c9c996b17>convinced my that I had it right. That's why I wrote this PR. I understand your point because you actually look at it from the schedulers point of view, I look at it from the users point of view. I already know 2 people in my team that fell into the trap with join nodes. And documenting it alone isn't enough, I would then be prepared to change this PR into a raise when we detect this with ALL_SUCCESS trigger. Still the first DAG I wrote will never work with the only alternative I have (ONE_SUCCESS), because the branch operator starts a branch that fans out to a lot of parallel tasks that all join together to the same node where the skipped branch arrive. This will never work with the ONE_SUCESS because as soon as one of the parallel branches succeed the rest will just stop. I know I can hack around this by introducing a DummyOperator where all successful branches arrive, but I feel dirty if I need to do this. So I propose this: - I'll change this PR so the ALL_SUCCESS raises and error. - Start on another PR that introduces 2 new triggers: -- ALL_SUCCESS_OR_SKIPPED -- ALL_FAILED_OR_SKIPPED What do you think? -- _/ _/ Alex Van Boxel
