On Fri, Aug 2, 2019 at 4:34 PM Brian Hulette <[email protected]> wrote:
> Thanks for the reply, I added some responses inline. > > On Fri, Aug 2, 2019 at 2:42 PM Ahmet Altay <[email protected]> wrote: > > > > There is a value in explicitly pinning the dependencies to be used in > the containers: > > - It reproducibly produces the same container. This will be important > once we start release Beam container. By looking at a Beam release branch, > one could exactly figure out the set of dependencies available in a > released container. > > > > - Package manager (pip) in this is notorious about resolving versions > for sub-dependencies, and sometimes picks incompatible dependencies. > > - SImilarly this repdocubility is helpful with tests, that can work on > the same container version unaffected by numerous dependency version > changes happening in sub-dependecies. > > I thought this may be the reason, it definitely makes sense to pin > dependencies for reproducible builds. If that is the case though I > think we should at least change the comment in > base_image_requirements.txt to say so (I'm happy to write a patch for > that). > Sounds good. > > > In addition, I will argue that we will want to keep a form of > base_image_requirements.txt, in order to be able to install additional > dependencies on the container and we would not be able to get rid of this > mechanism. > > My PR doesn't remove base_image_requirements.txt. It keeps the file > there but removes all the pinned beam dependencies, and instead just > lists the beam tar.gz as a dependency directly, along with all of the > additional dependencies. > Got it, thanks for the clarification. > > > > > > > I will suggest, If possible ensure that both files are modified > synchronously. This might be possible with precommit hooks although I am > not familiar. > > Yeah setting up something like that would help, but surely there's a > better way to lock dependencies than to count on people manually > updating this file (even with a precommit reminding them)? I'm not > that familiar with the complexities of python package management, but > it seems like we should be able to set up something with pip freeze or > pipenv's Pipfile.lock. The thing that confounds me when I try to think > through that setup though is how to reference a development build of > beam. If I run pip freeze in the container the requirement for > apache-beam is "2.16.0.dev0" which of course doesn't resolve when you > turn around and try to install it. Maybe an automated solution would > just replace the apache-beam line with > /opt/apache/beam/tars/apache-beam.tar.gz[gcp]? > I agree, this probably could be automated in a better way. I believe just doing "pip install /opt/apache/beam/tars/apache-beam.tar.gz[gcp]" at the beginning will prevent a later error for "pip install apache-beam==2.16.0.dev0". So we could have the former line in the docker file before installing the requirements. > > > > > > > Ahmet > > > > > > On Fri, Aug 2, 2019 at 2:20 PM Brian Hulette <[email protected]> > wrote: > >> > >> I recently ran into a portable python precommit failure that led me to > discover that python dependencies for the container are defined in two > different places, in slightly different ways: in setup.py with version > ranges [1], and in a base_image_requirements.txt file in the container > directory with pinned versions [2]. The latter includes some dependencies > that are commonly used in user code, in addition to the Beam dependencies, > and states that "Specifying the versions manually helps to resolve > dependency conflicts with other packages installed in the container." > >> > >> But isn't the purpose of a package manager to resolve those conflicts? > It seems that we should be able to just list the beam tar.gz as a > dependency and let pip resolve it. I wrote up a patch [3] to test this > approach and it seems to work. The only issue I ran into is that cython > _does_ needs to be installed separately before beam so that it can be used > when installing beam. Would this be a reasonable approach to remove the > duplicate dependency specifications? > >> > >> Apologies if I'm rehashing an old discussion, if there's already an ML > thread or PR discussing this issue I'd be happy to take a look at it. > >> > >> Thanks, > >> Brian > >> > >> [1] https://github.com/apache/beam/blob/master/sdks/python/setup.py > >> [2] > https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt > >> [3] https://github.com/apache/beam/pull/9236 > >> > >> > >> > >> PS - What caused my precommit failure: > >> I tested out the portable runner locally to debug the issue. It worked > fine with the loopback worker, but failed when using the docker worker. I > eventually determined the actual cause: I unintentionally relied on a > protobuf 3.8.0 feature in the code I added. The setup.py requirement, used > locally in the loopback worker, specified a range requirement which > resolved to 3.9.0, but the dependency for the container was pinned to > 3.6.1. I've proposed to resolve the issue by just bumping the lower bound > of the range in setup.py to 3.8.0, to reflect the new requirement. >
