potiuk commented on PR #22740:
URL: https://github.com/apache/airflow/pull/22740#issuecomment-1090641375

   > I'll have to take a while to read in detail, but at first blush this looks 
like a tonne of machinery and tooling required just to make pipx work for this 
development application. I still worry about maintaining it all, future edge 
cases, and the cognitive overload for the user to understand what exactly is 
going on here. I suppose that I still just don't see what this approach is 
buying us that is worth all this effort/code, over just using a local 
executable/script. But as always I'm happy to commit and help maintain it if 
the rest of the community thinks we're getting a big win from using this 
approach :)
   
   Let me elaborate on that :). 
   
   
   I've been dealing wiht this several times in the past at various 
incarnations, and what we get here is really the "best" approach so far.
   
   The root problem is not really a problem of `pipx`. It is changing from Bash 
to Python that creates this cognitive overload and the need for machinery.
   
   The "NICE" part of Bash is that it supposedly "just works" when you use just 
bash + POSIX tools. It's not entirely true any more actually. Apple with MacOS 
sticking to old version (due to licensing issues), some POSIXY and old tooling 
on Mac, lack of support for Windows without WSL2 make it "almost just works". 
And it's only me who "likes" bash from the community.
   
   Choosing Python is good idea for Airlfow. It has good reasoning because all 
airflow contributors know Python. But it has a caveat - you need to maintain 
virtual environment for anything but simplest scripts. There are optional 
dependencies, that need to be installed, in the right versions, they change 
over time, new are - inevitably - added as you need them. And there are no easy 
ways around that. Even now, those are the current dependencies for Breeze:
   
   ```
       click
       inputimeout
       importlib-metadata>=4.4; python_version < "3.8"
       pendulum
       psutil
       pytest
       pytest-xdist
       pyyaml
       requests
       rich
       rich_click
   ```
   
   And it is mind-boggling on its own that you have to maintain a separate 
small "virtualenv" to install a tool to manage development environment for 
Breeze which is also Python based and has its own dependencies (but in docker 
container). But you do, in fact. It's inevitable. Been there, done that few 
times
   
   THIS is the problem, not the `pipx`.
   
   The `pipx` solution actually makes it easier - because it "hides" the small 
venv and leaves you with the read-to-use almost-binary entrypoint that you can 
use. But it does not solve the upgrade/maintenance - it assumes that you have 
installable tool and that you manually manage it when it changes , but ... we 
really want to make sure that we dynamically manage it when we add new stuff, 
dependencies. The problem with Python deps is that when they are not installed 
- you find out that by (as Breeze user) by seing a cryptic ImportError stack 
trace. Hardly friendly message. 
   
   And what happens next are the developers (who only really care about 
developing Airflow code and do not know anything about the tool that manges 
their dev env) will complain that their tool stopped working and post the 
import error stack traces they got.
   
   We basically "killed" the idea of "script that just works" when we decided 
to switch to Python. Actually `pipx` makes it easier to manage the env not more 
difficult :).
   
   Ideally (and this was my initial idea) those kind of tools might be written 
in 'golang'. And this is for example what `astro-cli` does 
https://github.com/astronomer/astro-cli  to manage "DAG" development 
environment. It compiles to statically linked per-platform binary and adding 
new dependencies just cause bigger binary.  
   
   But the big disadvantage of golang is that you need to build and distribute 
and upgrade the binary and that ... our community does not use/know golang.  So 
Python is a better choice for us - even if we need more "machinery" to keep it 
updated. 
   
   The `pipx` solution (with the machinery) gets as close as it gets to it. You 
simply "self-upgrade" and you have a platform-independent 'breeze" binary on 
the path - wiht all dependencies updated to the ones you need at this very 
version. Without worrying about it, without posting "I have that stack trace", 
"it does not work".
   
   Did you realise that you need  pyaml` and `rich` and `inputimeout` installed 
to run the new Breeze up-and-running? 
   You should not even know that - honestly. What you need as a dependency for 
your tool is an internal detail. And this is what we get with this approach. 
You do not not know what you need. And when tomorrow you will need `requests` 
library for whatever reason - you will not know it either. You will jus get 
information that you should update and an offer to do it for you automatically 
if you answer "yes".


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to