potiuk commented on issue #27476: URL: https://github.com/apache/airflow/issues/27476#issuecomment-1305706506
@ephraimbuddy One reason I would think is that this case makes very little sense. Maybe it is unrelated - I personally think that having both git-sync AND persistentcy at the same time makes very little sense in general. I have yet to hear a good argument why just "git-sync" would not cut it. No-one yet was able to sufficiently explain my why they want to use Git Sync to atomically sync DAGs to one machine and then let the "network file system" distribute it farther to all the other components. IMHO it makes no sense, lacks the atomicity guarantees that Git-sync provides wihile incurring much more traffic and (usually) generating a lot of cost connected with continuously accessing remote filesystems by Airflow Scheduler. Because this is what effectively happens when you use persistency. There is no magic and the DAGs need to be distributed. It is so much better to use Git as synchronisation protocol - it's way better designed and optimized to share even huge number of of source code and has everything that is needed to synchronise incremental changes (i.e. commits) in atomic way (git-sync does it) only when needed (when new commit is there) plus it has this great capability that you set it as an init container and it will have to fully synchronise the whole folder before not allowing a component of Airflow to start before all DAGs are already present in the folder. All those properties are basicaly gone when you add persistency to Git Sync. And IMHO you have no benefits when you add persistency. Only problems. For all practical purposes having just git-sync and local filesystems completely separated from each other for each components is way better IMHO. This is explained in detail in this blog post https://medium.com/apache-airflow/shared-volumes-in-airflow-the-good-the-bad-and-the-ugly-22e9f681afca @alexakra @chuxiangfeng - maybe you can explain to me and argue what is the reason you need both Git-sync and persistency. What do you miss if you just use GitSync? Why do you REALLY need persistency - i.e. what you THINK you will get by using it. It puzzles me what is the reasoning and I asked already many times and never got a satisfactory answer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
