> These are -SNAPSHOT or .dev and distinct from releases. So perhaps this is just a matter of correct version number management?
In light of the concerns Valentyn & Ahmet raise, it seems safer to change tags instead of removing the pull. PR for Java: https://github.com/apache/beam/pull/10017 On Wed, Nov 6, 2019 at 1:41 PM Ahmet Altay <[email protected]> wrote: > > > On Wed, Nov 6, 2019 at 1:34 PM Valentyn Tymofieiev <[email protected]> > wrote: > >> On Wed, Nov 6, 2019 at 11:48 AM Kyle Weaver <[email protected]> wrote: >> >>> The way the Python SDK currently does this is to use the version as the >>> default tag, eg 2.16.0. While master uses 2.16.0.dev. This means there >>> should never be any conflicts between a release and developer image, unless >>> the user deliberately changes the image tags. >>> >>> > if a users' pipeline is relies on a container image released by Beam >>> ( or maybe a third party), external updates to such container image may not >>> propagate to the pipeline workflow without an explicit pull >>> >>> There should only be one released container per release. Upgrades to a >>> container image should not happen independently of the release process. >>> >> >> Fair point, although we have not yet encountered issues requiring an >> update of a previously released Docker, so I would not rule out >> considerations requiring us to re-release the image under the same tag. A >> scenario that is possible today is multiple pushes of container image to >> docker repo before the Beam release is finalized, so early adopters may be >> affected by stale images without a pull. >> > > This is an interesting problem. It is true that adopters of RCs may get > stuck pre-release candidates of those images. Could we still docker pull > only if user is trying to use a default and released image tag? > > >> >> >>> Note that so far I've just been discussing defaults. It's always >>> possible to use a custom container using environment_config, as mentioned >>> earlier. >>> >> >> My understanding is that to pull or not to pull decision equally applies >> to custom image provided by environment config. >> >> >>> The goal is to make that unnecessary for most everyday use cases and >>> development. Using different container images for different transforms is a >>> more specialized use case worth a separate discussion. >>> >>> On Wed, Nov 6, 2019 at 11:33 AM Valentyn Tymofieiev <[email protected]> >>> wrote: >>> >>>> Anyway, I agree with Thomas that implicitly running `docker pull` is >>>>> confusing and requires some adjustments to work around. The user can >>>>> always >>>>> run `docker pull` themselves if that's the intention. >>>> >>>> >>>> I understand that implicit pull may come across as surprising. However >>>> I see the required adjustments as a better practice. I would argue that >>>> customized containers images should not reuse the same name:tag >>>> combination, and it would also help the users avoid a situation where a >>>> runner may use a different container image in different execution >>>> environments. >>>> It may also help avoid issue where a user reports an issue with Beam, >>>> that others cannot reproduce only because a user was running a customized >>>> container on their local machine (and forgot about it). >>>> Also, if a users' pipeline is relies on a container image released by >>>> Beam ( or maybe a third party), external updates to such container image >>>> may not propagate to the pipeline workflow without an explicit pull >>>> >>>>> > 1. Read sdk version from gradle.properties and use this as the >>>>> default tag. Done with Python, need to implement it with Java and Go. >>>>> >>>>> 100% agree with this one. Using the same tag for local and release >>>>> images has already caused a good deal of confusion. Filed BEAM-8570 and >>>>> BEAM-8571 [2][3]. >>>>> >>>>> > 2. Remove pulling images before executing docker run command. This >>>>> should be fixed for Python, Java and Go. >>>>> >>>>> Valentyn (from [1]): >>>>> > I think pulling the latest image for the current tag is actually a >>>>> desired behavior, in case the external image was updated (due to a bug fix >>>>> for example). >>>>> >>>>> There's a PR for this [4]. Once we fix the default tag for Java/Go >>>>> containers, the dev and release containers will be distinct, which makes >>>>> it >>>>> seldom important whether or not the image is `docker pull`ed. Anyway, I >>>>> agree with Thomas that implicitly running `docker pull` is confusing and >>>>> requires some adjustments to work around. The user can always run `docker >>>>> pull` themselves if that's the intention. >>>>> >>>>> [1] >>>>> https://lists.apache.org/thread.html/0f2ccbbe7969b91dc21ba331c1a30d730e268cc0355c1ac1ba0b7988@%3Cdev.beam.apache.org%3E >>>>> [2] https://issues.apache.org/jira/browse/BEAM-8570 >>>>> [3] https://issues.apache.org/jira/browse/BEAM-8571 >>>>> [4] https://github.com/apache/beam/pull/9972 >>>>> >>>>> On Wed, Oct 2, 2019 at 5:32 PM Ahmet Altay <[email protected]> wrote: >>>>> >>>>>> I do not believe this is a blocker for Beam 2.16. I agree that it >>>>>> would be good to fix this. >>>>>> >>>>>> On Wed, Oct 2, 2019 at 3:15 PM Hannah Jiang <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Thomas >>>>>>> >>>>>>> Thanks for bring this up. >>>>>>> >>>>>>> Now Python uses sdk version as a default tag, while Java and Go use >>>>>>> latest as a default tag. I agree using latest as a tag is problematic. >>>>>>> The >>>>>>> reason only Python uses sdk version as a default tag is Python has >>>>>>> version.py so the version is easy to read. For Java and Go, we need to >>>>>>> read >>>>>>> it from gradle.properties when creating images with the default tag and >>>>>>> when setting the default image. >>>>>>> >>>>>>> Here is what we need to do: >>>>>>> 1. Read sdk version from gradle.properties and use this as the >>>>>>> default tag. Done with Python, need to implement it with Java and Go. >>>>>>> 2. Remove pulling images before executing docker run command. This >>>>>>> should be fixed for Python, Java and Go. >>>>>>> >>>>>>> Is this a blocker for 2.16? If so and above are too much work for >>>>>>> 2.16 at the moment, we can hardcode the default tag for release branch >>>>>>> for >>>>>>> now. >>>>>>> >>>>>>> Using timestamp as a tag is an option as well, as long as runners >>>>>>> know which timestamp they should use. >>>>>>> >>>>>>> Hannah >>>>>>> >>>>>>> On Wed, Oct 2, 2019 at 10:13 AM Alan Myrvold <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Yes, using the latest tag is problematic and can lead to unexpected >>>>>>>> behavior. >>>>>>>> Using a date/time or 2.17.0.dev-$USER tag would be better. The >>>>>>>> validates container shell script uses a datetime >>>>>>>> <https://github.com/apache/beam/blob/6551d0937ee31a8e310b63b222dbc750ec9331f8/sdks/python/container/run_validatescontainer.sh#L87> >>>>>>>> tag, which allows a unique name if no two tests are run in the same >>>>>>>> second. >>>>>>>> >>>>>>>> On Wed, Oct 2, 2019 at 10:05 AM Thomas Weise <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Want to bump this thread. >>>>>>>>> >>>>>>>>> If the current behavior is to replace locally built image with the >>>>>>>>> last published, then this is not only unexpected for developers but >>>>>>>>> also >>>>>>>>> problematic for the CI, where tests should run against what was built >>>>>>>>> from >>>>>>>>> source. Or am I missing something? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Thomas >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Sep 24, 2019 at 7:08 PM Thomas Weise <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Hannah, >>>>>>>>>> >>>>>>>>>> I believe this is unexpected from the developer perspective. When >>>>>>>>>> building something locally, we do expect that to be used. We may >>>>>>>>>> need to >>>>>>>>>> change to not pull when the image is available locally, at least >>>>>>>>>> when it is >>>>>>>>>> a snapshot/master branch. Release images should be immutable anyways. >>>>>>>>>> >>>>>>>>>> Thomas >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Sep 24, 2019 at 4:13 PM Hannah Jiang < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> A minor update, with custom container, the pipeline would not >>>>>>>>>>> fail, it throws out warning and moves on to `docker run` command. >>>>>>>>>>> >>>>>>>>>>> On Tue, Sep 24, 2019 at 4:05 PM Hannah Jiang < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Brian >>>>>>>>>>>> >>>>>>>>>>>> If we pull docker images, it always downloads from remote >>>>>>>>>>>> repository, which is expected behavior. >>>>>>>>>>>> In case we want to run a local image and pull it only when the >>>>>>>>>>>> image is not available at local, we can use `docker run` command >>>>>>>>>>>> directly, >>>>>>>>>>>> without pulling it in advance. [1] >>>>>>>>>>>> In case we want to pull images only when they are not >>>>>>>>>>>> available at local, we can use `docker images -q` to check if >>>>>>>>>>>> images are >>>>>>>>>>>> existing at local before pulling it. >>>>>>>>>>>> Another option is re-tag your local image, pass your image to >>>>>>>>>>>> pipeline and overwrite default one, but the code is still trying >>>>>>>>>>>> to pull, >>>>>>>>>>>> so if your image is not pushed to the remote repository, it would >>>>>>>>>>>> fail. >>>>>>>>>>>> >>>>>>>>>>>> 1. https://github.com/docker/cli/pull/1498 >>>>>>>>>>>> >>>>>>>>>>>> Hannah >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Sep 24, 2019 at 11:56 AM Brian Hulette < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I'm working on a demo cross-language pipeline on a local flink >>>>>>>>>>>>> cluster that relies on my python row coder PR [1]. The PR >>>>>>>>>>>>> includes some >>>>>>>>>>>>> changes to the Java worker code, so I need to build a Java SDK >>>>>>>>>>>>> container >>>>>>>>>>>>> locally and use that in the pipeline. >>>>>>>>>>>>> >>>>>>>>>>>>> Unfortunately, whenever I run the pipeline, >>>>>>>>>>>>> the apachebeam/java_sdk:latest tag is moved off of my locally >>>>>>>>>>>>> built image >>>>>>>>>>>>> to a newly downloaded image with a creation date 2 weeks ago, and >>>>>>>>>>>>> that >>>>>>>>>>>>> image is used instead. It looks like the reason is we run `docker >>>>>>>>>>>>> pull` >>>>>>>>>>>>> before running the container [2]. As the comment says this should >>>>>>>>>>>>> be a >>>>>>>>>>>>> no-op if the image already exists, but that doesn't seem to be >>>>>>>>>>>>> the case. If >>>>>>>>>>>>> I just run `docker pull apachebeam/java_sdk:latest` on my local >>>>>>>>>>>>> machine it >>>>>>>>>>>>> downloads the 2 week old image and happily informs me: >>>>>>>>>>>>> >>>>>>>>>>>>> Status: Downloaded newer image for apachebeam/java_sdk:latest >>>>>>>>>>>>> >>>>>>>>>>>>> Does anyone know how I can prevent `docker pull` from doing >>>>>>>>>>>>> this? I can unblock myself for now just by commenting out the >>>>>>>>>>>>> docker pull >>>>>>>>>>>>> command, but I'd like to understand what is going on here. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Brian >>>>>>>>>>>>> >>>>>>>>>>>>> [1] https://github.com/apache/beam/pull/9188 >>>>>>>>>>>>> [2] >>>>>>>>>>>>> https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerCommand.java#L80 >>>>>>>>>>>>> >>>>>>>>>>>>
