potiuk commented on issue #27476:
URL: https://github.com/apache/airflow/issues/27476#issuecomment-1307001887

   > First, The initial idea was to use the external dag files as a repository 
for code, but it was later discovered that git-sync might be a better option. 
So git-sync was turned on but not set dags.persistence to false, both were set 
to true, and the official documentation showed that it was possible to do this.
   
   Yep. I understand that. I am gathering the idas and evidences to make a 
stronger case on strongly discouraging turning persistence and git sync. As 
mentioned in the 
https://medium.com/apache-airflow/shared-volumes-in-airflow-the-good-the-bad-and-the-ugly-22e9f681afca
 - this is my (strong) opinion but otherwise it is a controversial subject. 
It's very difficult to convince people who think  that "shared volumes" is just 
K8S magic that works. 
   
   If you do not dig deeply into understanding that git-sync and shared volumes 
are actually doing the same thing in parallel and that git-sync performance 
optimisations are way better for Airflow case, it's very difficult to 
understand that this is the case (see the discussion above). 
   
   But - as discussed above - I have not see a single argument yet that 
convinces me that there is even a single case where git-sync + persistence is 
better in anything than just git-sync (and it is worse in a number of aspects). 
So far I saw precisely `0` arguments and cases that could convince me.
   
   I hope I will manage to convince other commiters and PMC members eventually 
that we should strongly discourage it in the docs, and that rather than 
"fixing" the case we should at least warn the users they should not do it (or 
maybe even outright disable this possibility in our chart).
   
   But we are not there yet.
   
   > Second,In the development phase, there is not much resource in the local 
built airflow, when testing or debugging code, does not want to add some 
temporary code to submit to the git warehouse, only to modify the local file to 
take effect, all debugging completed finally together to submit to git, rather 
than some useless debugging process code submission
   
   Yeah. As I mentioned in the article -  shared volumes are perfect for such 
"easy start" and quick iterations. But having Git Sync + dag persistence in 
this case makes no sense (git sycn will clash with manually overwritten files 
so it is very bad idea to have both if you want to manually iterate on those 
files without git). This the relevant excerpt (in the "good" part):
   
   > And if your users are mostly data scientists, who are used to iterate and 
change their files locally and experiment and quickly deploy stuff by just copy 
pasting they do not need to learn any new tools, nor follow any rigorous 
deployment workflow — hey we just copy file here and … it works.
   
   > Sounds cool? Yeah. Because it is cool.
   
   > When you have a small-ish installation and a handful of DAGs that are 
mostly accessed by one user, this is a perfect solution. And yeah, in such a 
case I’d heartily recommend deploying Airflow with shared volumes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to