[
https://issues.apache.org/jira/browse/AIRFLOW-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16677006#comment-16677006
]
Ash Berlin-Taylor commented on AIRFLOW-3285:
--------------------------------------------
The lazy feature as you have described it isn't something we'd accept as it's
quite a behaviour change and a little bit of a work-ardound, but a combo
trigger rule so we could do say {{trigger_rule=\{'all_done','one_failed',\}}}
to say "trigger on any of these conditions" would be acceptable
> lazy marking of upstream_failed task state
> ------------------------------------------
>
> Key: AIRFLOW-3285
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3285
> Project: Apache Airflow
> Issue Type: Improvement
> Reporter: Kevin McHale
> Priority: Minor
>
> Airflow aggressively applies the {{upstream_failed}} task state: as soon as a
> task fails, all of its downstream dependencies get marked. This sometimes
> creates problems for us at Etsy.
> In particular, we use a pattern for our hadoop Airflow DAGs along these lines:
> # the DAG creates a hadoop cluster in GCP/Dataproc
> # the DAG executes its tasks on the cluster
> # the DAG deletes the cluster once all tasks are done
> There are some cases in which the tasks immediately upstream of the
> cluster-delete step get marked as {{upstream_failed}}, triggering the
> cluster-delete step, even while other tasks continue to execute without
> problems on the cluster. The cluster-delete step of course kills all of the
> running tasks, requiring all of them to be re-run once the problem with the
> failed task is mitigated.
> As an example, a DAG that looks like this can exhibit the problem:
> {code:java}
> Cluster = ClusterCreateOperator(...)
> A = Job1Operator(...)
> Cluster << A
> B = Job2Operator(...)
> Cluster << B
> C = Job3Operator(...)
> A << C
> B << C
> ClusterDelete = DeleteClusterOperator(trigger_rule="all_done", ...)
> D << ClusterDelete{code}
> In a DAG like this, suppose task A fails while task B is running. Task C
> will immediately be marked as {{upstream_failed}}, which will cause
> ClusterDelete to run while task B is still running, which will cause task B
> to also fail.
> Our solution to this problem has been to implement something like [this
> diff|https://github.com/mchalek/incubator-airflow/commit/585349018656cd9b2e3e3e113db6412345485dde],
> which lazily applies the {{upstream_failed}} state only to tasks for which
> all upstream tasks have already completed.
> The consequence in terms of the example above is that task C will not be
> marked {{upstream_failed}} in response to task A failing until task B
> completes, ensuring that the cluster is not deleted while any upstream tasks
> are running.
> We find this not to have any adverse behavior on our airflow instances, so we
> run all of them with this lazy-marking feature enabled. However, we
> recognize that a change in behavior like this may be something that existing
> users will want to opt-in for, so we included a config flag in the diff that
> defaults to the original behavior.
> We would appreciate your consideration of incorporating this diff, or
> something like it, to allow us to configure this behavior in unmodified,
> upstream airflow.
> Thanks!
>
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)