On Fri, Aug 2, 2019 at 4:34 PM Brian Hulette <[email protected]> wrote:

> Thanks for the reply, I added some responses inline.
>
> On Fri, Aug 2, 2019 at 2:42 PM Ahmet Altay <[email protected]> wrote:
> >
> > There is a value in explicitly pinning the dependencies to be used in
> the containers:
> > - It reproducibly produces the same container. This will be important
> once we start release Beam container. By looking at a Beam release branch,
> one could exactly figure out the set of dependencies available in a
> released container.
> >
> > - Package manager (pip) in this is notorious about resolving versions
> for sub-dependencies, and sometimes picks incompatible dependencies.
> > - SImilarly this repdocubility is helpful with tests, that can work on
> the same container version unaffected by numerous dependency version
> changes happening in sub-dependecies.
>
> I thought this may be the reason, it definitely makes sense to pin
> dependencies for reproducible builds. If that is the case though I
> think we should at least change the comment in
> base_image_requirements.txt to say so (I'm happy to write a patch for
> that).
>

Sounds good.


>
> > In addition, I will argue that we will want to keep a form of
> base_image_requirements.txt, in order to be able to install additional
> dependencies on the container and we would not be able to get rid of this
> mechanism.
>
> My PR doesn't remove base_image_requirements.txt. It keeps the file
> there but removes all the pinned beam dependencies, and instead just
> lists the beam tar.gz as a dependency directly, along with all of the
> additional dependencies.
>

Got it, thanks for the clarification.


>
> >
> >
> > I will suggest, If possible ensure that both files are modified
> synchronously. This might be possible with precommit hooks although I am
> not familiar.
>
> Yeah setting up something like that would help, but surely there's a
> better way to lock dependencies than to count on people manually
> updating this file (even with a precommit reminding them)? I'm not
> that familiar with the complexities of python package management, but
> it seems like we should be able to set up something with pip freeze or
> pipenv's Pipfile.lock. The thing that confounds me when I try to think
> through that setup though is how to reference a development build of
> beam. If I run pip freeze in the container the requirement for
> apache-beam is "2.16.0.dev0" which of course doesn't resolve when you
> turn around and try to install it. Maybe an automated solution would
> just replace the apache-beam line with
> /opt/apache/beam/tars/apache-beam.tar.gz[gcp]?
>

I agree, this probably could be automated in a better way.

I believe just doing "pip
install /opt/apache/beam/tars/apache-beam.tar.gz[gcp]" at the beginning
will prevent a later error for "pip install apache-beam==2.16.0.dev0". So
we could have the former line in the docker file before installing the
requirements.


>
> >
> >
> > Ahmet
> >
> >
> > On Fri, Aug 2, 2019 at 2:20 PM Brian Hulette <[email protected]>
> wrote:
> >>
> >> I recently ran into a portable python precommit failure that led me to
> discover that python dependencies for the container are defined in two
> different places, in slightly different ways: in setup.py with version
> ranges [1], and in a base_image_requirements.txt file in the container
> directory with pinned versions [2]. The latter includes some dependencies
> that are commonly used in user code, in addition to the Beam dependencies,
> and states that "Specifying the versions manually helps to resolve
> dependency conflicts with other packages installed in the container."
> >>
> >> But isn't the purpose of a package manager to resolve those conflicts?
> It seems that we should be able to just list the beam tar.gz as a
> dependency and let pip resolve it. I wrote up a patch [3] to test this
> approach and it seems to work. The only issue I ran into is that cython
> _does_ needs to be installed separately before beam so that it can be used
> when installing beam. Would this be a reasonable approach to remove the
> duplicate dependency specifications?
> >>
> >> Apologies if I'm rehashing an old discussion, if there's already an ML
> thread or PR discussing this issue I'd be happy to take a look at it.
> >>
> >> Thanks,
> >> Brian
> >>
> >> [1] https://github.com/apache/beam/blob/master/sdks/python/setup.py
> >> [2]
> https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt
> >> [3] https://github.com/apache/beam/pull/9236
> >>
> >>
> >>
> >> PS - What caused my precommit failure:
> >> I tested out the portable runner locally to debug the issue. It worked
> fine with the loopback worker, but failed when using the docker worker. I
> eventually determined the actual cause: I unintentionally relied on a
> protobuf 3.8.0 feature in the code I added. The setup.py requirement, used
> locally in the loopback worker, specified a range requirement which
> resolved to 3.9.0, but the dependency for the container was pinned to
> 3.6.1. I've proposed to resolve the issue by just bumping the lower bound
> of the range in setup.py to 3.8.0, to reflect the new requirement.
>

Reply via email to