potiuk commented on a change in pull request #19189:
URL: https://github.com/apache/airflow/pull/19189#discussion_r737960560
##########
File path: airflow/operators/python.py
##########
@@ -338,6 +347,7 @@ def __init__(
python_version: Optional[Union[str, int, float]] = None,
use_dill: bool = False,
system_site_packages: bool = True,
+ clone_airflow_virtualenv: bool = True,
Review comment:
> they likely expect Airflow to be available when using the official
Docker container (because currently it installs Airflow globally). That won't
be true after this PR, so we need to call it out somewhere.`
Actually the case you described will work just fine with my proposed change.
This is why `clone_airflow_virtualenv` is `True` by default. I made it works
this way that (regardless of `use-syste-packages`) when
`clone_airflow_virtualenv` is `True' (default) and `python_version` is
unspecified, `set-system-packages` is ignored (!) - we are not using create
virtualenv at all here and it's only used when we create virtualenv from
scratch .
There is the opposite side effect really. If someone used
`system_site_packages=False` and did not specify `python_version` - the
resulting venv will be clone of the airlfow one and actually WILL contain the
system packages.
This is the choice I made and I am not super happy about it - but I think we
have no choice but choose what kind of behavioral change to implement that will
have least problems. I think having "clone of airflow" virtualenv here causes
least problems. It might slow down python interpreter bootstraping but IMHO it
might at most "fix" some cases rather than "break" them (for example macros
not passedbecause airflow could not be imported). I hardly imagine someone
choosing empty "venv" using the "airlfow" python interpreter and suffering if
it is not empty.
But yeah. It is behaviour change for sure.
And Yes. I tried to specify it in the docstring for now but we likely need
more explanation (and that's why I hesitate if THIS change shoudl be done
somewhat differently).
However it's far more complex than just that case. It gets really funny when
you understand how broken the current implementatio is when you add
python_version to the mix.
Currently If you specify 'system_site_packages' use - for example
`python_version=python37`and you locally run `python36` then you will not get
`airflow` (It's not there!), but if you use no `python_version` or `python3` or
'python36' - then you will actually get airflow in it.
There is no "explicit" expectation of the 'airflow presence" whether airflow
will be there or not - it depending on which python_version you use in
combination which is your current python_verison. 🤯
The thing is that this `system_site_packages` flag is passed now as `raw`
flag to the underlying venv creation method and it's "expectations are not
really stated other than that. Different people might have different
expectations about airflow being or not being there (based on their mix of
python versions) and they might not even realise that if they change it - it
will be different. Simply in most cases the PythonVirtualenv expects you to
understand all the consequences of those different parameters passed "raw".
BUT not always (and this is where it gets evne more 🤯) when it comes to dill
and pendulum (which in case you wonder are treated specially),
You might be suprised that some combination would not work at all anyway:
`system-packages=True, python_version=<different than yours>, use_dill=True`
This will not work (unless dill is already added as system dependency to the
other python or you add it explicitly in requirements). So far, so good - that
look "kinda ok" if we assume the "raw passing" of parameters and "understand
what python versions you have and what's installed in them" mantra.
BUT
This suprisingly WILL work even if you have no dill (same with pendulum) in
your requirements:
`system_packages=False, python_version=<different than yours>,
use_dill=True`
WAT???? (https://www.youtube.com/watch?v=et8xNAc2ic8)
Why? Because if missing dill and pendulum are added (but only when
system_packages=False 🤯) .
This is a very, very messy implementation . And we can't have 1-1 mapping
here and perfect compatibility I am afraid, And I know - my proposal makes it
even more messy. But I am kinda lost what better we could do.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]