potiuk commented on code in PR #23697: URL: https://github.com/apache/airflow/pull/23697#discussion_r872261223
########## docs/apache-airflow/extra-packages-ref.rst: ########## @@ -294,7 +294,12 @@ Those are extras that provide support for integration with external systems via Bundle extras ------------- -Those are extras that install one ore more extras as a bundle. +Those are extras that install one ore more extras as a bundle. Note that those extras should only be used for "development" version +of Airflow - i.e. when Airflow is installed from sources. Because of the way how bundle extras are constructed they might not +work when airflow is installed from 'PyPI`. + +If you want to install Airflow from PyPI with "all" extras (which should basically be never needed - you almost never need all extras from Airflow), +you need to list explicitly all the non-bundle extras that you want to install. Review Comment: This is not a problem and removing those devel-only requirements is extremely hard and borderline impossible in automated way. Why this is not a problem? Constraints are just constraints not requirements. it's merely a meta-data tellng "if you are installing freezegun, you should be installing this and that version". They have 0 influence on whether freezegun will be installed at all. Why this is difficutt or even impossible? The problem is that It might be very difficult to separate those out automatically (in a future-safe way) actually because we have no idea if dependency is not used in some of our dependencies beyond our devel deps. With Freezegun, this is **probably** fine as you know what freezegun is typically used for, but there were already a few cases where something that "seem" to be only used for devel, was actually used (wrongly or not but there was an implicit dependency) for some packages we installed. We could posisbly try to figure it out by traversing the dependency tree, but this is doomed because of the way extras are treated (and not even Airflow extras but other packages extras). The way `pip` treats extras is a bit unexpected (but there is no other good way) - they are optional and only valid during installation and the relation betwen the packages that installed them is gone right after the installation. The fact that there is a dependency between the package and whatever is declared as extra basically disappears at the moment `pip install` completes. Unfiortunately when a package declares dependency it MIGHT add extra and that leads to untraceable transitive dependencies. And that dependency - MIGHT declare other dependencies in THEIR extras, and it can go on and on. For example: if we ourselves (and this is not real case but might happen) add `apache-beam[pandas]` as dependency and <apache -beam[pandas]` adds `pandas[freeze]' as dependency and `pandas[freeze]` adds `freezegun<1.4` as dependency, then we have an implicit, transitive dependency on `freezgun<1.4` from "airflow[apache.beam]". And what's worse beyond the `pip` resolution process, we have no idea that: a) freezegun is actually our non-devel transitive dependency (throuhgh beam and pandas) b) we require freezegun < 1.4 This information is not available anywhere out-of-the-box. In other to know that we would have to repeat the `pip resolver` process and basically make a snapshot of the in-memory information that teh resolver keeps in-memory to find out which packages should be installed. This is the only way to find out that `freezegun` is our non-devel depenedency as well. If we remove freezegun from our constraints, it might be, that someone installs it in the version 1.5 precisely because we do not have it in constraints. Long story short - freezegun and the above example are pretty artifficial, but there is basically no way (other than repeating what the `pip` resolver does to generate the list of dependencies that are devel-only. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
