Thanks for the reply, I added some responses inline. On Fri, Aug 2, 2019 at 2:42 PM Ahmet Altay <[email protected]> wrote: > > There is a value in explicitly pinning the dependencies to be used in the > containers: > - It reproducibly produces the same container. This will be important once we > start release Beam container. By looking at a Beam release branch, one could > exactly figure out the set of dependencies available in a released container. > > - Package manager (pip) in this is notorious about resolving versions for > sub-dependencies, and sometimes picks incompatible dependencies. > - SImilarly this repdocubility is helpful with tests, that can work on the > same container version unaffected by numerous dependency version changes > happening in sub-dependecies.
I thought this may be the reason, it definitely makes sense to pin dependencies for reproducible builds. If that is the case though I think we should at least change the comment in base_image_requirements.txt to say so (I'm happy to write a patch for that). > In addition, I will argue that we will want to keep a form of > base_image_requirements.txt, in order to be able to install additional > dependencies on the container and we would not be able to get rid of this > mechanism. My PR doesn't remove base_image_requirements.txt. It keeps the file there but removes all the pinned beam dependencies, and instead just lists the beam tar.gz as a dependency directly, along with all of the additional dependencies. > > > I will suggest, If possible ensure that both files are modified > synchronously. This might be possible with precommit hooks although I am not > familiar. Yeah setting up something like that would help, but surely there's a better way to lock dependencies than to count on people manually updating this file (even with a precommit reminding them)? I'm not that familiar with the complexities of python package management, but it seems like we should be able to set up something with pip freeze or pipenv's Pipfile.lock. The thing that confounds me when I try to think through that setup though is how to reference a development build of beam. If I run pip freeze in the container the requirement for apache-beam is "2.16.0.dev0" which of course doesn't resolve when you turn around and try to install it. Maybe an automated solution would just replace the apache-beam line with /opt/apache/beam/tars/apache-beam.tar.gz[gcp]? > > > Ahmet > > > On Fri, Aug 2, 2019 at 2:20 PM Brian Hulette <[email protected]> wrote: >> >> I recently ran into a portable python precommit failure that led me to >> discover that python dependencies for the container are defined in two >> different places, in slightly different ways: in setup.py with version >> ranges [1], and in a base_image_requirements.txt file in the container >> directory with pinned versions [2]. The latter includes some dependencies >> that are commonly used in user code, in addition to the Beam dependencies, >> and states that "Specifying the versions manually helps to resolve >> dependency conflicts with other packages installed in the container." >> >> But isn't the purpose of a package manager to resolve those conflicts? It >> seems that we should be able to just list the beam tar.gz as a dependency >> and let pip resolve it. I wrote up a patch [3] to test this approach and it >> seems to work. The only issue I ran into is that cython _does_ needs to be >> installed separately before beam so that it can be used when installing >> beam. Would this be a reasonable approach to remove the duplicate dependency >> specifications? >> >> Apologies if I'm rehashing an old discussion, if there's already an ML >> thread or PR discussing this issue I'd be happy to take a look at it. >> >> Thanks, >> Brian >> >> [1] https://github.com/apache/beam/blob/master/sdks/python/setup.py >> [2] >> https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt >> [3] https://github.com/apache/beam/pull/9236 >> >> >> >> PS - What caused my precommit failure: >> I tested out the portable runner locally to debug the issue. It worked fine >> with the loopback worker, but failed when using the docker worker. I >> eventually determined the actual cause: I unintentionally relied on a >> protobuf 3.8.0 feature in the code I added. The setup.py requirement, used >> locally in the loopback worker, specified a range requirement which resolved >> to 3.9.0, but the dependency for the container was pinned to 3.6.1. I've >> proposed to resolve the issue by just bumping the lower bound of the range >> in setup.py to 3.8.0, to reflect the new requirement.
