potiuk commented on issue #27476:
URL: https://github.com/apache/airflow/issues/27476#issuecomment-1305706506

   @ephraimbuddy  
   
   One reason I would think is that this case makes very little sense. Maybe it 
is unrelated - I personally think that having both git-sync AND persistentcy at 
the same time makes very little sense in general. I have yet to hear a good 
argument why just "git-sync" would not cut it. No-one yet was able to 
sufficiently explain my why they want to use Git Sync to atomically sync DAGs 
to one machine and then let the "network file system" distribute it farther to 
all the other components. IMHO it makes no sense, lacks the atomicity 
guarantees that Git-sync provides wihile incurring much more traffic and 
(usually) generating a lot of cost connected with continuously accessing remote 
filesystems by Airflow Scheduler.  Because this is what effectively happens 
when you use persistency. There is no magic and the DAGs need to be 
distributed. 
   
   It is so much better to use Git as synchronisation protocol - it's way 
better designed and optimized to share even huge number of of source code and 
has everything that is needed to synchronise incremental changes (i.e. commits) 
in atomic way (git-sync does it) only when needed (when new commit is there) 
plus it has this great capability that you set it as an init container and it 
will have to fully synchronise the whole folder before not allowing a component 
of Airflow to start before all DAGs are already present in the folder. All 
those properties are basicaly gone when you add persistency to Git Sync. And 
IMHO you have no benefits when you add persistency. Only problems.
   
   For all practical purposes having just git-sync and local filesystems 
completely separated from each other for each components is way better IMHO.
   
   This is explained in detail in this blog post 
https://medium.com/apache-airflow/shared-volumes-in-airflow-the-good-the-bad-and-the-ugly-22e9f681afca
 
   
   @alexakra @chuxiangfeng  - maybe you can explain to me and argue what is the 
reason you need both Git-sync and persistency. What do you miss if you just use 
GitSync? Why do you REALLY need persistency - i.e. what you THINK you will get 
by using it. It puzzles me what is the reasoning and I asked already many times 
 and never got a satisfactory answer.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to