Hello!

I have a question on usage: is anyone using airflow for running many one
off / ad-hoc DAGs?

I really like Airflow for managing the dependencies of our scheduled ML
pipeline. And we also want to reuse the same dependencies for running one
off ML experiments, where the DAG might be a little different.

I've made this use case work right now by uploading DAGs to the Airflow
hosts under a dynamic DAG id so we have isolation between each DAG run / ML
experiment. However as the number of DAGs in Airflow grows, it looks like
the scheduler slows down significantly (seen in this reported issue as well
https://issues.apache.org/jira/browse/AIRFLOW-1139 ). Even if I turn "off"
a DAG, I notice it is being loaded into the DagBag. Is anyone else
experiencing trouble having a lot of DAGs? And is anyone else running many
one-off / run once DAGs?

Thanks in advance for any insight!
Duy

Reply via email to