potiuk commented on a change in pull request #12548:
URL: https://github.com/apache/airflow/pull/12548#discussion_r529481604
##########
File path: setup.py
##########
@@ -658,9 +655,10 @@ def write_version(filename: str = os.path.join(*[my_dir,
"airflow", "git_version
EXTRAS_PROVIDERS_PACKAGES: Dict[str, Iterable[str]] = {
'all': list(PROVIDERS_REQUIREMENTS.keys()),
- # this is not 100% accurate with devel_ci definition, but we really want
to have all providers
- # when devel_ci extra is installed!
+ # this is not 100% accurate with devel_ci and devel_all definition, but we
really want
+ # to have all providers when devel_ci extra is installed!
'devel_ci': list(PROVIDERS_REQUIREMENTS.keys()),
+ 'devel_all': list(PROVIDERS_REQUIREMENTS.keys()),
Review comment:
The `apache.beam` is an optional dependency that is needed for Dataflow
https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/operators/dataflow.py.
I think @mik-laj worked mostly on this (but it has been there for quite some
time before @mik-laj improved it and made a truly optional requirement).
As far as I know, previously "beam" requirements were implicitly needed when
you wanted to run the Dataflow operator. But @mik-laj changed it in the way
that you can optionally install some additional packages (including your own
version of beam package if you need). in a virtualenv dynamically created when
the task is running.
So technically speaking beam is not needed for dataflow, but the additional
virtualenv has many drawbacks - it takes quite time to install, in many
environments (including composer) you are not allowed to install additional
packages on-the-flight and you can have very brittle environment - for example
you can have conflicts when installing this new packages. Therefore the
possibility of using pre-installed apache-beam (as airflow extra) is also
retained - in this case the extra is useful if you want to install for example
only google provider and use only that (so that the many dependencies of
apache.beam do not conflict with dependencies of other extras).
That's why beam has been excluded from devel_ci - because we are running CI
tests with the assumption that someone uses extra virtualenv to run the Python
Dataflow operator.
I am also using this extra to install everything needed when I verify
backport operators. During this check, I import all classes from all providers
and if we have no "apache.beam" installed - some of the classes cannot be
imported.
Here is the relevant line from the Dataflow operator:
https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/operators/dataflow.py#L796
I hope it explains it all? And of course we can propose another solution for
that, maybe simply `devel_plus_beam` for example should be a good name for the
extra rather than `devel_all`. That correctly reflects the expectation.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]