There is a value in explicitly pinning the dependencies to be used in the containers: - It reproducibly produces the same container. This will be important once we start release Beam container. By looking at a Beam release branch, one could exactly figure out the set of dependencies available in a released container. - Package manager (pip) in this is notorious about resolving versions for sub-dependencies, and sometimes picks incompatible dependencies. - SImilarly this repdocubility is helpful with tests, that can work on the same container version unaffected by numerous dependency version changes happening in sub-dependecies.
In addition, I will argue that we will want to keep a form of base_image_requirements.txt, in order to be able to install additional dependencies on the container and we would not be able to get rid of this mechanism. I will suggest, If possible ensure that both files are modified synchronously. This might be possible with precommit hooks although I am not familiar. Ahmet On Fri, Aug 2, 2019 at 2:20 PM Brian Hulette <[email protected]> wrote: > I recently ran into a portable python precommit failure that led me to > discover that python dependencies for the container are defined in two > different places, in slightly different ways: in setup.py with version > ranges [1], and in a base_image_requirements.txt file in the container > directory with pinned versions [2]. The latter includes some dependencies > that are commonly used in user code, in addition to the Beam dependencies, > and states that "Specifying the versions manually helps to resolve > dependency conflicts with other packages installed in the container." > > But isn't the purpose of a package manager to resolve those conflicts? It > seems that we should be able to just list the beam tar.gz as a dependency > and let pip resolve it. I wrote up a patch [3] to test this approach and it > seems to work. The only issue I ran into is that cython _does_ needs to be > installed separately before beam so that it can be used when installing > beam. Would this be a reasonable approach to remove the duplicate > dependency specifications? > > Apologies if I'm rehashing an old discussion, if there's already an ML > thread or PR discussing this issue I'd be happy to take a look at it. > > Thanks, > Brian > > [1] https://github.com/apache/beam/blob/master/sdks/python/setup.py > [2] > https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt > [3] https://github.com/apache/beam/pull/9236 > > > > PS - What caused my precommit failure: > I tested out the portable runner locally to debug the issue. It worked > fine with the loopback worker, but failed when using the docker worker. I > eventually determined the actual cause: I unintentionally relied on a > protobuf 3.8.0 feature in the code I added. The setup.py requirement, used > locally in the loopback worker, specified a range requirement which > resolved to 3.9.0, but the dependency for the container was pinned to > 3.6.1. I've proposed to resolve the issue by just bumping the lower bound > of the range in setup.py to 3.8.0, to reflect the new requirement. >
