We've been evolving from type 1 you describe to a pull/poll version of the type 2 you describe. For type 1, it is really hard to tell what's going on (all the UI views become useless because they are so huge). Having one big dag also means you can't turn off the scheduler for individual parts, and the whole DAG fails if one task does, so if you can functionally separate them I think that gives you more configuration options. Our biggest DAG now is more like 22*10 tasks, which is still too big in our opinions. We leverage ExternalTaskSensors to link dags together which is more of a pull/poll paradigm, but you could use a TriggerDagRunOperator if you wanted more of a push/trigger paradigm which is what I hea ryou saying in type 2.
To your second question, our DAGs are dynamic based on the results of an API call we embed in the DAG and our scheduler is on a 5-second timelapse for each attemp to refill the DagBag. I think because of the frequency of the scheduler polling the files, because our API call is relatively fast, we are working with DAGs that are on a 24 hour schedule_interval, and the resultant DAG structure is not too large or complicated, we haven't had any issues with that or done anything special. I think it's just the fact of the matter that if you give the scheduler a lot of work to do to determine the DAG shape, it will take a while. Laura On Fri, Oct 21, 2016 at 10:52 AM, Boris Tyukin <bo...@boristyukin.com> wrote: > Guys, would you mind to chime in and share your experience? >