potiuk opened a new pull request, #28822:
URL: https://github.com/apache/airflow/pull/28822

   Git Sync and Persistence for DAGs makes very little sense together and is 
largely misleading our users on what it does.
   
   Git Sync provides atomicity of DAG folder synchronisation via checking out a 
complete copy of the DAGs folder and swapping symbolic link pointing to it. It 
does not play well with networked persistence.
   
   It makes it super-easy by users unaware how git-sync and persistence work 
under-the-hood to walk into several traps:
   
   * git sync on persistent remote volumes such as EFS generate a LOT of extra 
traffic due to the way how git sync works (it creates second working folder for 
dags and replaces symbolic link to folders which effectively forces full sync 
of whole DAG folder for all involved instances with every commit
   * due to that sync that gets distributed over multiple clients of persistent 
volumes it looses the atomicity property of git sync and the above case where 
there are burst of synchronisation betwween multiple nodes, it is very likely 
to trigger inconsistent DAG parsing
   * the problem amplifies when the network volumes are distributed among 
multiple nodes and there are some networking limits (for example not 
provisioned IOPS in EFS). The amount of traffic generated at sync might cause 
even more inconsistencies - only solvable by paying extra IOPS (where it would 
not be needed normally)
   * users might be tricked into trying to use gitSync and also update DAGs 
using persistence (so basically combine the development friendly dag 
distribution over persistent volumes and production-ready git-sync - without 
being aware that git-sync will override the manually synced DAGS when swapping 
the symbolic links
   
   Closes: #27545
   Closes: #27476
   Closes: #27080
   
   Related: #27124
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of an existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request 
Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)**
 for more information.
   In case of fundamental code changes, an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a 
newsfragment file, named `{pr_number}.significant.rst` or 
`{issue_number}.significant.rst`, in 
[newsfragments](https://github.com/apache/airflow/tree/main/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to