Hi,

I have ingestion pipelines which consumes data from source and saves to a
destination s3.
We have around 300 pipelines (from different mysql replicas) and around 300
dags for same.
I am keeping different dags, because some can trigger at different time of
day or different time of hour. But all dags look same with one task which
has spark-submit.
Only the argument to spark-submit changes with change in ingestion job.

Is it a good idea to just have one dag, and change values inside the dag to
create multiple dags from it ?

>From my point of view with dynamic dag,
I feel i loose the visibility of what is happening
If i want to delete dag, will there be an issue .
Is it heavy at scheduler parsing side ?

The advantage of dynamic dag is I have one file for all dags, and one yaml
file to keep all configuration files. Ill input my configuration files as
arguments to my spark submit job. The yaml file will also have start time
of dag, schedule interval etc.

Reply via email to