potiuk commented on issue #27476: URL: https://github.com/apache/airflow/issues/27476#issuecomment-1307001887
> First, The initial idea was to use the external dag files as a repository for code, but it was later discovered that git-sync might be a better option. So git-sync was turned on but not set dags.persistence to false, both were set to true, and the official documentation showed that it was possible to do this. Yep. I understand that. I am gathering the idas and evidences to make a stronger case on strongly discouraging turning persistence and git sync. As mentioned in the https://medium.com/apache-airflow/shared-volumes-in-airflow-the-good-the-bad-and-the-ugly-22e9f681afca - this is my (strong) opinion but otherwise it is a controversial subject. It's very difficult to convince people who think that "shared volumes" is just K8S magic that works. If you do not dig deeply into understanding that git-sync and shared volumes are actually doing the same thing in parallel and that git-sync performance optimisations are way better for Airflow case, it's very difficult to understand that this is the case (see the discussion above). But - as discussed above - I have not see a single argument yet that convinces me that there is even a single case where git-sync + persistence is better in anything than just git-sync (and it is worse in a number of aspects). So far I saw precisely `0` arguments and cases that could convince me. I hope I will manage to convince other commiters and PMC members eventually that we should strongly discourage it in the docs, and that rather than "fixing" the case we should at least warn the users they should not do it (or maybe even outright disable this possibility in our chart). But we are not there yet. > Second,In the development phase, there is not much resource in the local built airflow, when testing or debugging code, does not want to add some temporary code to submit to the git warehouse, only to modify the local file to take effect, all debugging completed finally together to submit to git, rather than some useless debugging process code submission Yeah. As I mentioned in the article - shared volumes are perfect for such "easy start" and quick iterations. But having Git Sync + dag persistence in this case makes no sense (git sycn will clash with manually overwritten files so it is very bad idea to have both if you want to manually iterate on those files without git). This the relevant excerpt (in the "good" part): > And if your users are mostly data scientists, who are used to iterate and change their files locally and experiment and quickly deploy stuff by just copy pasting they do not need to learn any new tools, nor follow any rigorous deployment workflow — hey we just copy file here and … it works. > Sounds cool? Yeah. Because it is cool. > When you have a small-ish installation and a handful of DAGs that are mostly accessed by one user, this is a perfect solution. And yeah, in such a case I’d heartily recommend deploying Airflow with shared volumes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
