Actually it's not really possible, by design. Airflow makes it such that
your script can only define one DAG shape for a single DAG object that has
a unique corresponding `dag_id`. You cannot really express the use case
where your DAG would have a different shape over time (or given a DagRun
payload) using a single DAG object.

Allowing this would not only break the semantic of the name DAG (making a
DAG object many different DAGs based on context) and bring much more
complexity that may be hard to comprehend or simply visualize from a UI
perspective. For example, if you look at the "tree view", you can imagine
that such a [useful] view wouldn't work in the context of a DAG that
changes a lot from run to run.

Knowing this, here are some options around heterogenous DAGs shapes:
* For slow changing DAGs where a few tasks may be added or removed over
time, you can generally manage that by being careful around
start_date/end_date of the task, and perhaps populating some historical
states when needed (backfill with or without mark_success depending on your
use case). The typical way to approach DAG shape change can involve pausing
the DAG, setting up the right start date, altering state if/where needed,
and unpausing the DAG.
* using templates or PythonOperator, you can force tasks to not run, just
succeed based on conditions, basically skipping tasks depending on
arbitrary criteria. The DAG shape is the same, but tasks are instructed to
skip and succeed based on context
* if each run is very heterogenous across runs, we recommend that you
instantiate different "singleton" DAGs with a different `dag_id` using
`schedule_interval='@once'`, each dag_id is expected to run a single time
and can have a distinct shape
* for a major break in shape over time, where the shape is homogenous
before a big change, then there's a major change, then it's homogenous
again, you may want to keep the before and after DAGs around as 2 different
objects, with their respective start_date/end_date/dag_id that do no
overlap. This use either DAGs when backfilling and apply the proper logic
to the right date range.

So essentially the constraint is that a DAG is a single Directed Acyclic
Graph, not a collection or DAGs that depend on input parameter (that's
logical given the object's name). You can easily build a DAG factory as a
function that can spit out different DAG objects based on params, but it's
a constraint that each has a unique `dag_id`.

Note that it could be interesting to have the notion of a "DAG Family",
that could represent a set of DAG that have something in common (for
example, if they are generated from the same DAG Factory). Unfortunately
introducing a new entity (DAGFamily) may represent a significant amount of
work. It's also unclear how introducing this notion would help beyond what
we get from simple conventions like prefixing the dag_id with something
that represents the DAG family.

Max

On Tue, May 30, 2017 at 7:30 AM, Scott Halgrim <[email protected]>
wrote:

> I think so. It’s not completely clear what you want to do with those
> different tasks but you should be able to create those tasks with a factory
> method. We have a subdag whose tasks vary depending on how many tables it
> finds in our database (one task per table).
>
> Scott
>
> On May 30, 2017, 7:21 AM -0700, Leroy Julien <[email protected]>,
> wrote:
> > Hi,
> >
> > I would like to know if it’s possible to make a DAG with a variable
> number of tasks depending on a parameter given to the 'trigger_dag -c’
> command.
> >
> > Thanks
> > Julien
>

Reply via email to