potiuk commented on PR #22740:
URL: https://github.com/apache/airflow/pull/22740#issuecomment-1090641375
> I'll have to take a while to read in detail, but at first blush this looks
like a tonne of machinery and tooling required just to make pipx work for this
development application. I still worry about maintaining it all, future edge
cases, and the cognitive overload for the user to understand what exactly is
going on here. I suppose that I still just don't see what this approach is
buying us that is worth all this effort/code, over just using a local
executable/script. But as always I'm happy to commit and help maintain it if
the rest of the community thinks we're getting a big win from using this
approach :)
Let me elaborate on that :).
I've been dealing wiht this several times in the past at various
incarnations, and what we get here is really the "best" approach so far.
The root problem is not really a problem of `pipx`. It is changing from Bash
to Python that creates this cognitive overload and the need for machinery.
The "NICE" part of Bash is that it supposedly "just works" when you use just
bash + POSIX tools. It's not entirely true any more actually. Apple with MacOS
sticking to old version (due to licensing issues), some POSIXY and old tooling
on Mac, lack of support for Windows without WSL2 make it "almost just works".
And it's only me who "likes" bash from the community.
Choosing Python is good idea for Airlfow. It has good reasoning because all
airflow contributors know Python. But it has a caveat - you need to maintain
virtual environment for anything but simplest scripts. There are optional
dependencies, that need to be installed, in the right versions, they change
over time, new are - inevitably - added as you need them. And there are no easy
ways around that. Even now, those are the current dependencies for Breeze:
```
click
inputimeout
importlib-metadata>=4.4; python_version < "3.8"
pendulum
psutil
pytest
pytest-xdist
pyyaml
requests
rich
rich_click
```
And it is mind-boggling on its own that you have to maintain a separate
small "virtualenv" to install a tool to manage development environment for
Breeze which is also Python based and has its own dependencies (but in docker
container). But you do, in fact. It's inevitable. Been there, done that few
times
THIS is the problem, not the `pipx`.
The `pipx` solution actually makes it easier - because it "hides" the small
venv and leaves you with the read-to-use almost-binary entrypoint that you can
use. But it does not solve the upgrade/maintenance - it assumes that you have
installable tool and that you manually manage it when it changes , but ... we
really want to make sure that we dynamically manage it when we add new stuff,
dependencies. The problem with Python deps is that when they are not installed
- you find out that by (as Breeze user) by seing a cryptic ImportError stack
trace. Hardly friendly message.
And what happens next are the developers (who only really care about
developing Airflow code and do not know anything about the tool that manges
their dev env) will complain that their tool stopped working and post the
import error stack traces they got.
We basically "killed" the idea of "script that just works" when we decided
to switch to Python. Actually `pipx` makes it easier to manage the env not more
difficult :).
Ideally (and this was my initial idea) those kind of tools might be written
in 'golang'. And this is for example what `astro-cli` does
https://github.com/astronomer/astro-cli to manage "DAG" development
environment. It compiles to statically linked per-platform binary and adding
new dependencies just cause bigger binary.
But the big disadvantage of golang is that you need to build and distribute
and upgrade the binary and that ... our community does not use/know golang. So
Python is a better choice for us - even if we need more "machinery" to keep it
updated.
The `pipx` solution (with the machinery) gets as close as it gets to it. You
simply "self-upgrade" and you have a platform-independent 'breeze" binary on
the path - wiht all dependencies updated to the ones you need at this very
version. Without worrying about it, without posting "I have that stack trace",
"it does not work".
Did you realise that you need pyaml` and `rich` and `inputimeout` installed
to run the new Breeze up-and-running?
You should not even know that - honestly. What you need as a dependency for
your tool is an internal detail. And this is what we get with this approach.
You do not not know what you need. And when tomorrow you will need `requests`
library for whatever reason - you will not know it either. You will jus get
information that you should update and an offer to do it for you automatically
if you answer "yes".
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]