I asked a very similar question last month and got no responses. Note that
SubDags execute backfill commands in in 1.8.0. The original text of that
question is as follows:

I've recently upgraded to 1.8.0 and immediately encountered the hanging
SubDag issue that's been mentioned. I'm not sure the rollback from rc5 to
rc4 fixed the issue.  For now I've removed all SubDags and put their
task_instances in the main DAG.

Assuming this issue gets fixed, how is one supposed to recover from
failures within SubDags after the # of retries have maxed?  Previously, I
would clear the state of the offending tasks and run a backfill job.
Backfill jobs in 1.7.1 would skip successful task_instances and only run
the task_instances with cleared states. Now, backfills and SubDagOperators
clear the state of successful tasks. I'd rather not re-run a task that
already succeeded. I tried running backfills with --task_regex and
--ignore_dependencies, but that doesn't quite work either.

If I have t1(success) -> t2(clear) -> t3(clear) and I set --task_regex so
that it excludes t1, then t2 will run, but t3 will never run because it
doesn't wait for t2 to finish. It fails because its upstream dependency
condition is not met.

I like the logical grouping that SubDags provide, but I don't want all
retry all tasks even if they're successful. I can see why one would want
that behavior in some cases, but it's certainly not useful in all.

On Tue, Apr 18, 2017 at 6:45 PM, Chris Fei <[email protected]> wrote:

> Hi all,
>
>
>
> I'm new to Airflow, and I'm looking for someone to clarify the expected
> behavior of running a backfill with regard to previously successful
> tasks. When I run a backfill on 1.8.0, tasks that were previously run
> successfully are re-run for me. Is it expected that backfills re-run all
> tasks, even those that were marked as successful? For reference, the
> command I'm running is `airflow backfill -s 2017-04-01 -e 2017-04-03
> Tutorial`.
>
>
> I wasn't able to find anything in the documentation to indicate either
> which way. Some brief research revealed that invoking backfill was meant
> at one point to "fill in the blanks", which I interpret to mean "only
> run tasks that were not completed successfully". On the contrary, the
> code *does* seem to explicitly set all task instances for a given DAGRun
> to SCHEDULED (see [AIRFLOW-910][1] and
> https://github.com/apache/incubator-airflow/pull/2107/files#diff-
> 54a57ccc2c8e73d12c812798bf79ccb2R1816).
>
>
> Apologies for such a fundamental question, just want to make sure I'm
> not missing something obvious here. Can someone clarify?
>
>
> Thanks,
>
> Chris Fei
>
>
> Links:
>
>   1. https://issues.apache.org/jira/browse/AIRFLOW-910
>

Reply via email to