potiuk commented on a change in pull request #19189:
URL: https://github.com/apache/airflow/pull/19189#discussion_r737960560



##########
File path: airflow/operators/python.py
##########
@@ -338,6 +347,7 @@ def __init__(
         python_version: Optional[Union[str, int, float]] = None,
         use_dill: bool = False,
         system_site_packages: bool = True,
+        clone_airflow_virtualenv: bool = True,

Review comment:
       > they likely expect Airflow to be available when using the official 
Docker container (because currently it installs Airflow globally). That won't 
be true after this PR, so we need to call it out somewhere.`
   
   Actually the case you described will work just fine with my proposed change. 
This is why `clone_airflow_virtualenv`  is `True` by default. I made it works 
this way that (regardless of  `use-syste-packages`) when 
`clone_airflow_virtualenv` is `True` (default) and `python_version` is 
unspecified, `set-system-packages` is ignored (!) - we are not using create 
virtualenv  at all here and it's only used when we create virtualenv from 
scratch .
   
   There is the opposite side effect really. If someone used 
`system_site_packages=False` and did not specify `python_version` - the 
resulting venv will be clone of the airlfow one and actually WILL contain the 
system packages.
   
   This is the choice I made and I am not super happy about it - but I think we 
have no choice but choose what kind of behavioral change to implement that will 
have least problems. I think having "clone of airflow" virtualenv here causes 
least problems. It might slow down python interpreter bootstraping but IMHO it 
might at most "fix"  some cases rather than "break" them (for example macros 
not passedbecause airflow could not be imported). I hardly imagine someone 
choosing empty "venv" using the "airlfow" python interpreter and suffering if 
it is not empty.
   
   But yeah. It is behaviour change for sure.
   
   And Yes. I tried to specify it in the docstring for now  but we likely need 
more explanation (and that's why I hesitate if THIS change shoudl be done 
somewhat differently).
   
   However it's far more complex than just that case. It gets really funny when 
you understand how broken the current implementatio is when you add 
python_version to the mix. 
   
   Currently If you specify 'system_site_packages' use - for example 
`python_version=python37`and you locally run `python36` then you will not get 
`airflow` (It's not there!), but if you use no `python_version` or `python3` or 
'python36' - then you will actually get airflow in it.
   
   There is no "explicit" expectation of the 'airflow presence" whether airflow 
will be there or not - it depends on which python_version you use in 
combination which is your current python_verison. 🤯  
   
   The thing is that this `system_site_packages` flag is passed now as `raw` 
flag to the underlying venv creation method and it's "expectations are not 
really stated other than that. Different people might have different 
expectations about airflow being or not being there (based on their mix of 
python versions) and they might not even realise that if they change it - it 
will be different. Simply in most cases the PythonVirtualenv expects you to 
understand all the consequences of those different parameters passed "raw".  
   
   BUT not always (and this is where it gets evne more 🤯) when it comes to dill 
and pendulum (which in case you wonder are treated specially), 
   
   You might be suprised that some combination would not work at all anyway:
   
    `system-packages=True, python_version=<different than yours>, use_dill=True`
   
   This will not work (unless dill is already added as system dependency to the 
other python or you add it explicitly in requirements). So far, so good - that 
look "kinda ok" if we assume the "raw passing" of parameters and "understand 
what python versions you have and what's installed in them" mantra.  
   
   BUT
   
   This suprisingly WILL work even if you have no dill (same with pendulum) in 
your requirements:
   
   `system_packages=False, python_version=<different than yours>, 
use_dill=True` 
   
   WAT???? (https://www.youtube.com/watch?v=et8xNAc2ic8) 
   
   Why? Because  if missing dill and pendulum are added (but only when 
system_packages=False 🤯) .
   
   This is a very, very messy implementation . And we can't have 1-1 mapping 
here and perfect compatibility I am afraid, And I know - my proposal makes it 
even more messy. But I am kinda lost what better we could do. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to