I recently ran into a portable python precommit failure that led me to discover that python dependencies for the container are defined in two different places, in slightly different ways: in setup.py with version ranges [1], and in a base_image_requirements.txt file in the container directory with pinned versions [2]. The latter includes some dependencies that are commonly used in user code, in addition to the Beam dependencies, and states that "Specifying the versions manually helps to resolve dependency conflicts with other packages installed in the container."
But isn't the purpose of a package manager to resolve those conflicts? It seems that we should be able to just list the beam tar.gz as a dependency and let pip resolve it. I wrote up a patch [3] to test this approach and it seems to work. The only issue I ran into is that cython _does_ needs to be installed separately before beam so that it can be used when installing beam. Would this be a reasonable approach to remove the duplicate dependency specifications? Apologies if I'm rehashing an old discussion, if there's already an ML thread or PR discussing this issue I'd be happy to take a look at it. Thanks, Brian [1] https://github.com/apache/beam/blob/master/sdks/python/setup.py [2] https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt [3] https://github.com/apache/beam/pull/9236 PS - What caused my precommit failure: I tested out the portable runner locally to debug the issue. It worked fine with the loopback worker, but failed when using the docker worker. I eventually determined the actual cause: I unintentionally relied on a protobuf 3.8.0 feature in the code I added. The setup.py requirement, used locally in the loopback worker, specified a range requirement which resolved to 3.9.0, but the dependency for the container was pinned to 3.6.1. I've proposed to resolve the issue by just bumping the lower bound of the range in setup.py to 3.8.0, to reflect the new requirement.
