Hi, I have ingestion pipelines which consumes data from source and saves to a destination s3. We have around 300 pipelines (from different mysql replicas) and around 300 dags for same. I am keeping different dags, because some can trigger at different time of day or different time of hour. But all dags look same with one task which has spark-submit. Only the argument to spark-submit changes with change in ingestion job.
Is it a good idea to just have one dag, and change values inside the dag to create multiple dags from it ? >From my point of view with dynamic dag, I feel i loose the visibility of what is happening If i want to delete dag, will there be an issue . Is it heavy at scheduler parsing side ? The advantage of dynamic dag is I have one file for all dags, and one yaml file to keep all configuration files. Ill input my configuration files as arguments to my spark submit job. The yaml file will also have start time of dag, schedule interval etc.
