[GitHub] [airflow] potiuk commented on issue #27476: Volume is missing for sshKeySecret when dag persistence is enabled.

GitBox Tue, 08 Nov 2022 02:43:29 -0800


potiuk commented on issue #27476:
URL: https://github.com/apache/airflow/issues/27476#issuecomment-1307001887

> First, The initial idea was to use the external dag files as a repository
for code, but it was later discovered that git-sync might be a better option.
So git-sync was turned on but not set dags.persistence to false, both were set
to true, and the official documentation showed that it was possible to do this.

Yep. I understand that. I am gathering the idas and evidences to make a
stronger case on strongly discouraging turning persistence and git sync. As
mentioned in the
https://medium.com/apache-airflow/shared-volumes-in-airflow-the-good-the-bad-and-the-ugly-22e9f681afca
- this is my (strong) opinion but otherwise it is a controversial subject.
It's very difficult to convince people who think that "shared volumes" is just
K8S magic that works.

If you do not dig deeply into understanding that git-sync and shared volumes
are actually doing the same thing in parallel and that git-sync performance
optimisations are way better for Airflow case, it's very difficult to
understand that this is the case (see the discussion above).

But - as discussed above - I have not see a single argument yet that
convinces me that there is even a single case where git-sync + persistence is
better in anything than just git-sync (and it is worse in a number of aspects).
So far I saw precisely `0` arguments and cases that could convince me.

I hope I will manage to convince other commiters and PMC members eventually
that we should strongly discourage it in the docs, and that rather than
"fixing" the case we should at least warn the users they should not do it (or
maybe even outright disable this possibility in our chart).

But we are not there yet.

> Second,In the development phase, there is not much resource in the local
built airflow, when testing or debugging code, does not want to add some
temporary code to submit to the git warehouse, only to modify the local file to
take effect, all debugging completed finally together to submit to git, rather
than some useless debugging process code submission

Yeah. As I mentioned in the article - shared volumes are perfect for such
"easy start" and quick iterations. But having Git Sync + dag persistence in
this case makes no sense (git sycn will clash with manually overwritten files
so it is very bad idea to have both if you want to manually iterate on those
files without git). This the relevant excerpt (in the "good" part):

> And if your users are mostly data scientists, who are used to iterate and
change their files locally and experiment and quickly deploy stuff by just copy
pasting they do not need to learn any new tools, nor follow any rigorous
deployment workflow — hey we just copy file here and … it works.

> Sounds cool? Yeah. Because it is cool.

> When you have a small-ish installation and a handful of DAGs that are
mostly accessed by one user, this is a perfect solution. And yeah, in such a
case I’d heartily recommend deploying Airflow with shared volumes.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] potiuk commented on issue #27476: Volume is missing for sshKeySecret when dag persistence is enabled.

Reply via email to