Sorry again for posting in the DEV group since it is user type question but I do not think we have a user group list and I do not feel that gitter is appropriate for this sort of discussions.
I am actively testing Airflow for a specific use case which is a generation of workflows/tasks for 200-300 tables in my source database. Every table will have a set a pretty standard tasks (~6-8) and tasks will have some conditions (e.g. on some runs some of the tasks will be skipped). After looking at Oozie, Luigi, Pinball and Airflow and watching numerous presentations, I thought Airflow was a perfect match for that. I am really blown away by features and potential and use cases. I know many of committers are doing something similar and I'd love to hear the details and some guidance in terms of best practices. I heard 3 options I think: #1 Create one big DAG (in my case it would be a DAG with 300x8 tasks) #2 Create one DAG that will generate smaller DAGs (as described here https://wecode.wepay.com/posts/airflow-wepay) #3 A combination of #1 and #2? like external .py file to generate "static" DAG on demand (e.g. adding a new table or removing one) Second question related to this concept. Airflow webserver and scheduler polls DAG folder to fill up DagBag and it does that very frequently by default (every minute?) The problem is that it takes time to generate DAG dynamically - in my case I will be using some metadata from YAML or database and this process might well take a minute or too. How do you deal with this? Thanks again for such an amazing project!
