There is a value in explicitly pinning the dependencies to be used in the
containers:
- It reproducibly produces the same container. This will be important once
we start release Beam container. By looking at a Beam release branch, one
could exactly figure out the set of dependencies available in a released
container.
- Package manager (pip) in this is notorious about resolving versions for
sub-dependencies, and sometimes picks incompatible dependencies.
- SImilarly this repdocubility is helpful with tests, that can work on the
same container version unaffected by numerous dependency version changes
happening in sub-dependecies.

In addition, I will argue that we will want to keep a form
of base_image_requirements.txt, in order to be able to install additional
dependencies on the container and we would not be able to get rid of this
mechanism.

I will suggest, If possible ensure that both files are modified
synchronously. This might be possible with precommit hooks although I am
not familiar.

Ahmet


On Fri, Aug 2, 2019 at 2:20 PM Brian Hulette <[email protected]> wrote:

> I recently ran into a portable python precommit failure that led me to
> discover that python dependencies for the container are defined in two
> different places, in slightly different ways: in setup.py with version
> ranges [1], and in a base_image_requirements.txt file in the container
> directory with pinned versions [2]. The latter includes some dependencies
> that are commonly used in user code, in addition to the Beam dependencies,
> and states that "Specifying the versions manually helps to resolve
> dependency conflicts with other packages installed in the container."
>
> But isn't the purpose of a package manager to resolve those conflicts? It
> seems that we should be able to just list the beam tar.gz as a dependency
> and let pip resolve it. I wrote up a patch [3] to test this approach and it
> seems to work. The only issue I ran into is that cython _does_ needs to be
> installed separately before beam so that it can be used when installing
> beam. Would this be a reasonable approach to remove the duplicate
> dependency specifications?
>
> Apologies if I'm rehashing an old discussion, if there's already an ML
> thread or PR discussing this issue I'd be happy to take a look at it.
>
> Thanks,
> Brian
>
> [1] https://github.com/apache/beam/blob/master/sdks/python/setup.py
> [2]
> https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt
> [3] https://github.com/apache/beam/pull/9236
>
>
>
> PS - What caused my precommit failure:
> I tested out the portable runner locally to debug the issue. It worked
> fine with the loopback worker, but failed when using the docker worker. I
> eventually determined the actual cause: I unintentionally relied on a
> protobuf 3.8.0 feature in the code I added. The setup.py requirement, used
> locally in the loopback worker, specified a range requirement which
> resolved to 3.9.0, but the dependency for the container was pinned to
> 3.6.1. I've proposed to resolve the issue by just bumping the lower bound
> of the range in setup.py to 3.8.0, to reflect the new requirement.
>

Reply via email to