[GitHub] [airflow] potiuk commented on issue #27476: Volume is missing for sshKeySecret when dag persistence is enabled.

GitBox Mon, 07 Nov 2022 06:34:54 -0800


potiuk commented on issue #27476:
URL: https://github.com/apache/airflow/issues/27476#issuecomment-1305706506

@ephraimbuddy

One reason I would think is that this case makes very little sense. Maybe it
is unrelated - I personally think that having both git-sync AND persistentcy at
the same time makes very little sense in general. I have yet to hear a good
argument why just "git-sync" would not cut it. No-one yet was able to
sufficiently explain my why they want to use Git Sync to atomically sync DAGs
to one machine and then let the "network file system" distribute it farther to
all the other components. IMHO it makes no sense, lacks the atomicity
guarantees that Git-sync provides wihile incurring much more traffic and
(usually) generating a lot of cost connected with continuously accessing remote
filesystems by Airflow Scheduler. Because this is what effectively happens
when you use persistency. There is no magic and the DAGs need to be
distributed.

It is so much better to use Git as synchronisation protocol - it's way
better designed and optimized to share even huge number of of source code and
has everything that is needed to synchronise incremental changes (i.e. commits)
in atomic way (git-sync does it) only when needed (when new commit is there)
plus it has this great capability that you set it as an init container and it
will have to fully synchronise the whole folder before not allowing a component
of Airflow to start before all DAGs are already present in the folder. All
those properties are basicaly gone when you add persistency to Git Sync. And
IMHO you have no benefits when you add persistency. Only problems.

For all practical purposes having just git-sync and local filesystems
completely separated from each other for each components is way better IMHO.

This is explained in detail in this blog post
https://medium.com/apache-airflow/shared-volumes-in-airflow-the-good-the-bad-and-the-ugly-22e9f681afca

@alexakra @chuxiangfeng - maybe you can explain to me and argue what is the
reason you need both Git-sync and persistency. What do you miss if you just use
GitSync? Why do you REALLY need persistency - i.e. what you THINK you will get
by using it. It puzzles me what is the reasoning and I asked already many times
and never got a satisfactory answer.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] potiuk commented on issue #27476: Volume is missing for sshKeySecret when dag persistence is enabled.

Reply via email to