potiuk commented on code in PR #23697:
URL: https://github.com/apache/airflow/pull/23697#discussion_r872261223


##########
docs/apache-airflow/extra-packages-ref.rst:
##########
@@ -294,7 +294,12 @@ Those are extras that provide support for integration with 
external systems via
 Bundle extras
 -------------
 
-Those are extras that install one ore more extras as a bundle.
+Those are extras that install one ore more extras as a bundle. Note that those 
extras should only be used for "development" version
+of Airflow - i.e. when Airflow is installed from sources. Because of the way 
how bundle extras are constructed they might not
+work when airflow is installed from 'PyPI`.
+
+If you want to install Airflow from PyPI with "all" extras (which should 
basically be never needed - you almost never need all extras from Airflow),
+you need to list explicitly all the non-bundle extras that you want to install.

Review Comment:
   This is not a problem and removing those devel-only requirements is 
extremely hard and borderline impossible in automated way. 
   
   Why this is not a problem?
   
   Constraints are just constraints not requirements. it's merely a meta-data 
tellng "if you are installing freezegun, you should be installing this and that 
version". They have 0 influence on whether freezegun will be installed at all.
   
   Why this is difficutt or even impossible?
   
   The problem is that It might be very difficult to separate those out 
automatically (in a future-safe way) actually because we have no idea if 
dependency is not used in some of our dependencies beyond our devel deps. 
   
   With Freezegun, this is **probably** fine as you know what freezegun is 
typically used for, but there were already a few cases where something that 
"seem" to be only used for devel, was actually used (wrongly or not but there 
was an implicit dependency) for some packages we installed.
   
   We could posisbly try to figure it out by traversing the dependency tree, 
but this is doomed because of the way extras are treated (and not even Airflow 
extras but other packages extras).  The way `pip` treats extras is a bit 
unexpected (but there is no other good way) - they are optional and only valid 
during installation and the relation betwen the packages that installed them is 
gone right after the installation. The fact that there is a dependency between 
the package and whatever is declared as extra basically disappears at the 
moment `pip install` completes.
   
   Unfiortunately when a package declares dependency  it MIGHT add extra and 
that leads to untraceable transitive dependencies. And that dependency - MIGHT  
declare other dependencies in THEIR extras, and it can go on and on.
   
   For example: 
   
   if we ourselves (and this is not real case but might happen) add 
`apache-beam[pandas]` as dependency and `apache -beam[pandas]` adds 
`pandas[freeze]` as dependency and `pandas[freeze]` adds `freezegun<1.4` as 
dependency, then we have an implicit, transitive dependency on `freezgun<1.4` 
from `airflow[apache.beam]`. 
   
   And what's worse beyond the `pip` resolution process, we  have no idea that:
   
   a) freezegun is actually our non-devel transitive dependency (throuhgh beam 
and pandas)
   b) we require freezegun < 1.4
   
   This information is not available anywhere out-of-the-box. In other to know 
that - we would have to repeat the `pip resolver` process and basically make a 
snapshot of the in-memory information that teh resolver keeps in-memory to find 
out which packages should be installed. This is the only way to find out that 
`freezegun` is our non-devel depenedency as well.
   
   If we remove freezegun from our constraints, it might be, that someone 
installs it in the version 1.5 precisely because we do not have it in 
constraints. 
   
   Long story short - freezegun and the above example are pretty artifficial, 
but there is basically no way (other than repeating what the `pip` resolver 
does to generate the list of dependencies that are devel-only. 
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to