Sorry again for posting in the DEV group since it is user type question but
I do not think we have a user group list and I do not feel that gitter is
appropriate for this sort of discussions.

I am actively testing Airflow for a specific use case which is a generation
of workflows/tasks for 200-300 tables in my source database. Every table
will have a set a pretty standard tasks (~6-8) and tasks will have some
conditions (e.g. on some runs some of the tasks will be skipped).

After looking at Oozie, Luigi, Pinball and Airflow and watching numerous
presentations, I thought Airflow was a perfect match for that. I am really
blown away by features and potential and use cases.

I know many of committers are doing something similar and I'd love to hear
the details and some guidance in terms of best practices. I heard 3 options
I think:

#1 Create one big DAG (in my case it would be a DAG with 300x8 tasks)

#2 Create one DAG that will generate smaller DAGs (as described here

#3 A combination of #1 and #2? like external .py file to generate "static"
DAG on demand (e.g. adding a new table or removing one)

Second question related to this concept. Airflow webserver and scheduler
polls DAG folder to fill up DagBag and it does that very frequently by
default (every minute?) The problem is that it takes time to generate DAG
dynamically - in my case I will be using some metadata from YAML or
database and this process might well take a minute or too. How do you deal
with this?

Thanks again for such an amazing project!

Reply via email to