Hello,

I'm currently using Airflow for some ETL tasks where I submit a spark job
to a cluster and poll till it is complete. This workflow is nice because it
is typically a single Dag. I'm now starting to do more machine learning
tasks and need to build a model per client which is 1000+ clients. My
spark cluster is capable of handling this workload, however, it doesn't
seem scalable to write 1000+ dags to fit models for each client. I want
each client to have its own task instance so it can be retried if it
fails without having to run all 1000+ tasks over again. How do I handle
this type of workflow in Airflow?

Reply via email to