I recently ran into a portable python precommit failure that led me to
discover that python dependencies for the container are defined in two
different places, in slightly different ways: in setup.py with version
ranges [1], and in a base_image_requirements.txt file in the container
directory with pinned versions [2]. The latter includes some dependencies
that are commonly used in user code, in addition to the Beam dependencies,
and states that "Specifying the versions manually helps to resolve
dependency conflicts with other packages installed in the container."

But isn't the purpose of a package manager to resolve those conflicts? It
seems that we should be able to just list the beam tar.gz as a dependency
and let pip resolve it. I wrote up a patch [3] to test this approach and it
seems to work. The only issue I ran into is that cython _does_ needs to be
installed separately before beam so that it can be used when installing
beam. Would this be a reasonable approach to remove the duplicate
dependency specifications?

Apologies if I'm rehashing an old discussion, if there's already an ML
thread or PR discussing this issue I'd be happy to take a look at it.

Thanks,
Brian

[1] https://github.com/apache/beam/blob/master/sdks/python/setup.py
[2]
https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt
[3] https://github.com/apache/beam/pull/9236



PS - What caused my precommit failure:
I tested out the portable runner locally to debug the issue. It worked fine
with the loopback worker, but failed when using the docker worker. I
eventually determined the actual cause: I unintentionally relied on a
protobuf 3.8.0 feature in the code I added. The setup.py requirement, used
locally in the loopback worker, specified a range requirement which
resolved to 3.9.0, but the dependency for the container was pinned to
3.6.1. I've proposed to resolve the issue by just bumping the lower bound
of the range in setup.py to 3.8.0, to reflect the new requirement.

Reply via email to