Actually it's not really possible, by design. Airflow makes it such that your script can only define one DAG shape for a single DAG object that has a unique corresponding `dag_id`. You cannot really express the use case where your DAG would have a different shape over time (or given a DagRun payload) using a single DAG object.
Allowing this would not only break the semantic of the name DAG (making a DAG object many different DAGs based on context) and bring much more complexity that may be hard to comprehend or simply visualize from a UI perspective. For example, if you look at the "tree view", you can imagine that such a [useful] view wouldn't work in the context of a DAG that changes a lot from run to run. Knowing this, here are some options around heterogenous DAGs shapes: * For slow changing DAGs where a few tasks may be added or removed over time, you can generally manage that by being careful around start_date/end_date of the task, and perhaps populating some historical states when needed (backfill with or without mark_success depending on your use case). The typical way to approach DAG shape change can involve pausing the DAG, setting up the right start date, altering state if/where needed, and unpausing the DAG. * using templates or PythonOperator, you can force tasks to not run, just succeed based on conditions, basically skipping tasks depending on arbitrary criteria. The DAG shape is the same, but tasks are instructed to skip and succeed based on context * if each run is very heterogenous across runs, we recommend that you instantiate different "singleton" DAGs with a different `dag_id` using `schedule_interval='@once'`, each dag_id is expected to run a single time and can have a distinct shape * for a major break in shape over time, where the shape is homogenous before a big change, then there's a major change, then it's homogenous again, you may want to keep the before and after DAGs around as 2 different objects, with their respective start_date/end_date/dag_id that do no overlap. This use either DAGs when backfilling and apply the proper logic to the right date range. So essentially the constraint is that a DAG is a single Directed Acyclic Graph, not a collection or DAGs that depend on input parameter (that's logical given the object's name). You can easily build a DAG factory as a function that can spit out different DAG objects based on params, but it's a constraint that each has a unique `dag_id`. Note that it could be interesting to have the notion of a "DAG Family", that could represent a set of DAG that have something in common (for example, if they are generated from the same DAG Factory). Unfortunately introducing a new entity (DAGFamily) may represent a significant amount of work. It's also unclear how introducing this notion would help beyond what we get from simple conventions like prefixing the dag_id with something that represents the DAG family. Max On Tue, May 30, 2017 at 7:30 AM, Scott Halgrim <[email protected]> wrote: > I think so. It’s not completely clear what you want to do with those > different tasks but you should be able to create those tasks with a factory > method. We have a subdag whose tasks vary depending on how many tables it > finds in our database (one task per table). > > Scott > > On May 30, 2017, 7:21 AM -0700, Leroy Julien <[email protected]>, > wrote: > > Hi, > > > > I would like to know if it’s possible to make a DAG with a variable > number of tasks depending on a parameter given to the 'trigger_dag -c’ > command. > > > > Thanks > > Julien >
