potiuk commented on PR #27124: URL: https://github.com/apache/airflow/pull/27124#issuecomment-1291170469
I think you are complicating Airflow's DAG syncing needlessly. IMHO It should not be Airflow's job to combine syncing data from multiple sources - this is not Airflow's job really and I would be the first one to strongly oppose such feature. It has a number of problems (includiing managing access to write different DAGs, knowing when to sync which dirs, knowing what are relationships between different subfolders etc. that should be solved externally to Airflow). We certainly do not want to solve all those problems at the level of Airflow. And we do not have to. This problem can be solved way better. I believe you should combine all of your DAG sources outside of Airflow and let Airflow use single GitSync from single repository (by using submodules and recursive update you should be able to achieve pulling DAGs from multiple repositories). You can achieve what you want by: 1) Creating single git repo which will have mutliple sub-repos - each git subrepo in different directory 2) Making sure your CI/CD and other sources push their DAGs to their respective Git repos This (and even way more complex) setup have been successfully deployed and works very well for a number of companies. You can watch this - fantastic - talk from JAGEX where they explained how they manage to run 170+ github repos synced by a single GitSync and effectively manage dependencies between different parts of they DAG folder using git. This enables you setting up your own conventions, CI checks, switch different branches of different repos manually for all your sub-folders and generally keep full history of all "valid" combinations of all repos in the single "umbrella" git repo where all the sub-dags are kept as submodules. It can't get better than that - all without even touching any of Airflow complexity - either in Airflow itself or in the Helm chart. The talk is here https://airflowsummit.org/sessions/2022/manage-dags-at-scale/ Please watch it and let me know why you think such solution will not work for you. I think you will need extremely strong justifications if you want to increase complexity and add such feature to Airflow - because you can solve it way simpler following good and working practices. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
