potiuk opened a new pull request, #28822: URL: https://github.com/apache/airflow/pull/28822
Git Sync and Persistence for DAGs makes very little sense together and is largely misleading our users on what it does. Git Sync provides atomicity of DAG folder synchronisation via checking out a complete copy of the DAGs folder and swapping symbolic link pointing to it. It does not play well with networked persistence. It makes it super-easy by users unaware how git-sync and persistence work under-the-hood to walk into several traps: * git sync on persistent remote volumes such as EFS generate a LOT of extra traffic due to the way how git sync works (it creates second working folder for dags and replaces symbolic link to folders which effectively forces full sync of whole DAG folder for all involved instances with every commit * due to that sync that gets distributed over multiple clients of persistent volumes it looses the atomicity property of git sync and the above case where there are burst of synchronisation betwween multiple nodes, it is very likely to trigger inconsistent DAG parsing * the problem amplifies when the network volumes are distributed among multiple nodes and there are some networking limits (for example not provisioned IOPS in EFS). The amount of traffic generated at sync might cause even more inconsistencies - only solvable by paying extra IOPS (where it would not be needed normally) * users might be tricked into trying to use gitSync and also update DAGs using persistence (so basically combine the development friendly dag distribution over persistent volumes and production-ready git-sync - without being aware that git-sync will override the manually synced DAGS when swapping the symbolic links Closes: #27545 Closes: #27476 Closes: #27080 Related: #27124 <!-- Thank you for contributing! Please make sure that your code changes are covered with tests. And in case of new features or big changes remember to adjust the documentation. Feel free to ping committers for the review! In case of an existing issue, reference it using one of the following: closes: #ISSUE related: #ISSUE How to write a good git commit message: http://chris.beams.io/posts/git-commit/ --> --- **^ Add meaningful description above** Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information. In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed. In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
