[
https://issues.apache.org/jira/browse/AIRFLOW-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16333074#comment-16333074
]
David Stuck commented on AIRFLOW-1296:
--------------------------------------
I'm running into an issue where skipped states are not propagating when there
are multiple branching operators. Here's an example that fails for me in 1.8.2:
[https://gist.github.com/dstuck/b6f750d0f8f3556b98e04c407a4cb166]
It appears to be a race condition with marking deadlock; removing the deadlock
code or just adding extra dummy nodes after the BRANCH_O operator in the gist
above allow the SKIPPED state to propagate and the dag succeeds.
Note that Tylar's proposed solution of recursively marking states as SKIPPED is
not acceptable since a skipped branch can be upstream from operators that still
need to run like in the example listed
[here|[https://airflow.apache.org/concepts.html#branching].] To handle this
properly I think you only want to skip tasks when all their upstream tasks are
skipped.
> DAGs using operators involving cascading skipped tasks fail prematurely
> -----------------------------------------------------------------------
>
> Key: AIRFLOW-1296
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1296
> Project: Apache Airflow
> Issue Type: Bug
> Components: scheduler
> Affects Versions: 1.8.1
> Reporter: Daniel Huang
> Assignee: Bolke de Bruin
> Priority: Blocker
> Fix For: 1.8.2
>
>
> So this is basically the same issue as AIRFLOW-872 and AIRFLOW-719. A
> workaround had fixed this
> (https://github.com/apache/incubator-airflow/pull/2125), but was later
> reverted (https://github.com/apache/incubator-airflow/pull/2195). I totally
> agree with the reason for reverting, but I still think this is an issue.
> The issue is related to any operators that involves cascading skipped tasks,
> like ShortCircuitOperator or LatestOnlyOperator. These operators mark only
> their *direct* downstream task as SKIPPED, but additional downstream tasks
> from that skipped task is left up to the scheduler to cascade the SKIPPED
> state (see latest only op docs about this expected behavior
> https://airflow.incubator.apache.org/concepts.html#latest-run-only). However,
> instead the scheduler marks the DAG run as FAILED prematurely before the DAG
> has a chance to skip all downstream tasks.
> This example DAG should reproduce the issue:
> https://gist.github.com/dhuang/61d38fb001c3a917edf4817bb0c915f9.
> Expected result: DAG succeeds with tasks - latest_only (success) -> dummy1
> (skipped) -> dummy2 (skipped) -> dummy3 (skipped)
> Actual result: DAG fails with tasks - latest_only (success) -> dummy1
> (skipped) -> dummy2 (none) -> dummy3 (none)
> I believe the results I'm seeing are because of this deadlock prevention
> logic,
> https://github.com/apache/incubator-airflow/blob/1.8.1/airflow/models.py#L4182.
> While that actual result shown above _could_ mean a deadlock, in this case
> it shouldn't be. Since this {{update_state}} logic is reached first in each
> scheduler run, dummy2/dummy3 don't get a chance to cascade the SKIPPED state.
> Commenting out that block gives me the results I expect.
> [~bolke] I know you spent awhile trying to reproduce my issue and weren't
> able to, but I'm still hitting this on a fresh environment, default configs,
> sqlite/mysql dbs, local/sequential/celery executors, and 1.8.1/master.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)