Hi, Thanks for your answer Max. That exactly what I thought, I ask the question just to be sure.
Julien > Le 30 mai 2017 à 23:12, Maxime Beauchemin <[email protected]> a > écrit : > > Actually it's not really possible, by design. Airflow makes it such that > your script can only define one DAG shape for a single DAG object that has > a unique corresponding `dag_id`. You cannot really express the use case > where your DAG would have a different shape over time (or given a DagRun > payload) using a single DAG object. > > Allowing this would not only break the semantic of the name DAG (making a > DAG object many different DAGs based on context) and bring much more > complexity that may be hard to comprehend or simply visualize from a UI > perspective. For example, if you look at the "tree view", you can imagine > that such a [useful] view wouldn't work in the context of a DAG that > changes a lot from run to run. > > Knowing this, here are some options around heterogenous DAGs shapes: > * For slow changing DAGs where a few tasks may be added or removed over > time, you can generally manage that by being careful around > start_date/end_date of the task, and perhaps populating some historical > states when needed (backfill with or without mark_success depending on your > use case). The typical way to approach DAG shape change can involve pausing > the DAG, setting up the right start date, altering state if/where needed, > and unpausing the DAG. > * using templates or PythonOperator, you can force tasks to not run, just > succeed based on conditions, basically skipping tasks depending on > arbitrary criteria. The DAG shape is the same, but tasks are instructed to > skip and succeed based on context > * if each run is very heterogenous across runs, we recommend that you > instantiate different "singleton" DAGs with a different `dag_id` using > `schedule_interval='@once'`, each dag_id is expected to run a single time > and can have a distinct shape > * for a major break in shape over time, where the shape is homogenous > before a big change, then there's a major change, then it's homogenous > again, you may want to keep the before and after DAGs around as 2 different > objects, with their respective start_date/end_date/dag_id that do no > overlap. This use either DAGs when backfilling and apply the proper logic > to the right date range. > > So essentially the constraint is that a DAG is a single Directed Acyclic > Graph, not a collection or DAGs that depend on input parameter (that's > logical given the object's name). You can easily build a DAG factory as a > function that can spit out different DAG objects based on params, but it's > a constraint that each has a unique `dag_id`. > > Note that it could be interesting to have the notion of a "DAG Family", > that could represent a set of DAG that have something in common (for > example, if they are generated from the same DAG Factory). Unfortunately > introducing a new entity (DAGFamily) may represent a significant amount of > work. It's also unclear how introducing this notion would help beyond what > we get from simple conventions like prefixing the dag_id with something > that represents the DAG family. > > Max > > On Tue, May 30, 2017 at 7:30 AM, Scott Halgrim <[email protected]> > wrote: > >> I think so. It’s not completely clear what you want to do with those >> different tasks but you should be able to create those tasks with a factory >> method. We have a subdag whose tasks vary depending on how many tables it >> finds in our database (one task per table). >> >> Scott >> >> On May 30, 2017, 7:21 AM -0700, Leroy Julien <[email protected]>, >> wrote: >>> Hi, >>> >>> I would like to know if it’s possible to make a DAG with a variable >> number of tasks depending on a parameter given to the 'trigger_dag -c’ >> command. >>> >>> Thanks >>> Julien >>
