potiuk commented on PR #27124:
URL: https://github.com/apache/airflow/pull/27124#issuecomment-1291170469

   I think you are complicating Airflow's DAG syncing needlessly. IMHO It 
should not be Airflow's job to combine syncing data from multiple sources  - 
this is not Airflow's job really and I would be the first one to strongly 
oppose such feature. It has a number of problems (includiing managing access to 
write different DAGs, knowing when to sync which dirs, knowing what are 
relationships between different subfolders etc. that should be solved 
externally to Airflow). We certainly do not want to solve all those problems at 
the level of Airflow. And we do not have to. This problem can be solved way 
better.
   
   I believe you should combine all of your DAG sources outside of Airflow and 
let Airflow use single GitSync from single repository (by using submodules and 
recursive update you should be able to achieve pulling DAGs from multiple 
repositories).
   
   You can achieve what you want by:
   
   1) Creating single git repo which will have mutliple sub-repos - each git 
subrepo in different directory
   2) Making sure your CI/CD and other sources push their DAGs to their 
respective Git repos
   
   This (and even way more complex) setup have been successfully deployed and 
works very well for a number of companies. You can watch this - fantastic - 
talk from JAGEX where they explained how they manage to run 170+ github repos 
synced by a single GitSync and effectively manage dependencies between 
different parts of they DAG folder using git. This enables you setting up your 
own conventions, CI checks, switch different branches of different repos 
manually for all your sub-folders and generally keep full history of all 
"valid" combinations of all repos in the single "umbrella" git repo where all 
the sub-dags are kept as submodules. It can't get better than that - all 
without even touching any of Airflow complexity - either in Airflow itself or 
in the Helm chart.
   
   The talk is here 
https://airflowsummit.org/sessions/2022/manage-dags-at-scale/
   
   Please watch it and let me know why you think such solution will not work 
for you. I think you will need extremely strong justifications if you want to 
increase complexity and add such feature to Airflow - because you can solve it 
way simpler following good and working practices. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to