potiuk commented on a change in pull request #12548:
URL: https://github.com/apache/airflow/pull/12548#discussion_r529481604



##########
File path: setup.py
##########
@@ -658,9 +655,10 @@ def write_version(filename: str = os.path.join(*[my_dir, 
"airflow", "git_version
 
 EXTRAS_PROVIDERS_PACKAGES: Dict[str, Iterable[str]] = {
     'all': list(PROVIDERS_REQUIREMENTS.keys()),
-    # this is not 100% accurate with devel_ci definition, but we really want 
to have all providers
-    # when devel_ci extra is installed!
+    # this is not 100% accurate with devel_ci and devel_all definition, but we 
really want
+    # to have all providers when devel_ci extra is installed!
     'devel_ci': list(PROVIDERS_REQUIREMENTS.keys()),
+    'devel_all': list(PROVIDERS_REQUIREMENTS.keys()),

Review comment:
       The `apache.beam` is an optional dependency that is needed for Dataflow  
https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/operators/dataflow.py.
   
   I think @mik-laj worked mostly on this (but it has been there for quite some 
time before @mik-laj improved it and made a truly optional requirement). 
   
   As far as I know, previously "beam" requirements were implicitly needed when 
you wanted to run the Dataflow operator. But @mik-laj changed it in the way 
that you can optionally install some additional packages (including your own 
version of beam package if you need). in a virtualenv dynamically created when 
the task is running. 
   
   So technically speaking beam is not needed for dataflow, but the additional 
virtualenv has many drawbacks - it takes quite  time to install, in many 
environments (including composer) you are not allowed to install additional 
packages on-the-flight and you can have very brittle environment - for example 
you can have conflicts when installing this new packages. Therefore the 
possibility of using pre-installed apache-beam (as airflow extra) is also 
retained - in this case the extra is useful if you want to install for example 
only google provider and use only that (so that the many dependencies of 
apache.beam do not conflict with dependencies of other extras).
   
   That's why beam has been excluded from devel_ci - because we are running CI 
tests with the assumption that someone uses extra virtualenv to run the Python 
Dataflow operator.
   
   I am also using this extra to install everything needed when I verify 
backport operators. During this check, I import all classes from all providers 
and if we have no "apache.beam" installed - some of the classes cannot be 
imported.
   
   Here is the relevant line from the Dataflow operator: 
https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/operators/dataflow.py#L796
   
   I hope it explains it all? And of course we can propose another solution for 
that, maybe simply `devel_plus_beam` for example should be a good name for the 
extra rather than `devel_all`. That correctly reflects the expectation.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to