As developer in Beam, I expect something that I built locally to be used. That's the case with Java and Python dependencies also.
These are -SNAPSHOT or .dev and distinct from releases. So perhaps this is just a matter of correct version number management? Users that consume our releases (version) should see the published Docker images. It would be great if that just works out of the box, w/o extra pipeline options magic. Thomas On Wed, Nov 6, 2019 at 11:33 AM Valentyn Tymofieiev <valen...@google.com> wrote: > Anyway, I agree with Thomas that implicitly running `docker pull` is >> confusing and requires some adjustments to work around. The user can always >> run `docker pull` themselves if that's the intention. > > > I understand that implicit pull may come across as surprising. However I > see the required adjustments as a better practice. I would argue that > customized containers images should not reuse the same name:tag > combination, and it would also help the users avoid a situation where a > runner may use a different container image in different execution > environments. > It may also help avoid issue where a user reports an issue with Beam, that > others cannot reproduce only because a user was running a customized > container on their local machine (and forgot about it). > Also, if a users' pipeline is relies on a container image released by > Beam ( or maybe a third party), external updates to such container image > may not propagate to the pipeline workflow without an explicit pull. Always > pulling the image may help to ensure a more deterministic behavior. > > On Wed, Nov 6, 2019 at 10:38 AM Kyle Weaver <kcwea...@google.com> wrote: > >> Bumping this thread from the other one [1]. >> >> > 1. Read sdk version from gradle.properties and use this as the default >> tag. Done with Python, need to implement it with Java and Go. >> >> 100% agree with this one. Using the same tag for local and release images >> has already caused a good deal of confusion. Filed BEAM-8570 and BEAM-8571 >> [2][3]. >> >> > 2. Remove pulling images before executing docker run command. This >> should be fixed for Python, Java and Go. >> >> Valentyn (from [1]): >> > I think pulling the latest image for the current tag is actually a >> desired behavior, in case the external image was updated (due to a bug fix >> for example). >> >> There's a PR for this [4]. Once we fix the default tag for Java/Go >> containers, the dev and release containers will be distinct, which makes it >> seldom important whether or not the image is `docker pull`ed. Anyway, I >> agree with Thomas that implicitly running `docker pull` is confusing and >> requires some adjustments to work around. The user can always run `docker >> pull` themselves if that's the intention. >> >> [1] >> https://lists.apache.org/thread.html/0f2ccbbe7969b91dc21ba331c1a30d730e268cc0355c1ac1ba0b7988@%3Cdev.beam.apache.org%3E >> [2] https://issues.apache.org/jira/browse/BEAM-8570 >> [3] https://issues.apache.org/jira/browse/BEAM-8571 >> [4] https://github.com/apache/beam/pull/9972 >> >> On Wed, Oct 2, 2019 at 5:32 PM Ahmet Altay <al...@google.com> wrote: >> >>> I do not believe this is a blocker for Beam 2.16. I agree that it would >>> be good to fix this. >>> >>> On Wed, Oct 2, 2019 at 3:15 PM Hannah Jiang <hannahji...@verily.com> >>> wrote: >>> >>>> Hi Thomas >>>> >>>> Thanks for bring this up. >>>> >>>> Now Python uses sdk version as a default tag, while Java and Go use >>>> latest as a default tag. I agree using latest as a tag is problematic. The >>>> reason only Python uses sdk version as a default tag is Python has >>>> version.py so the version is easy to read. For Java and Go, we need to read >>>> it from gradle.properties when creating images with the default tag and >>>> when setting the default image. >>>> >>>> Here is what we need to do: >>>> 1. Read sdk version from gradle.properties and use this as the default >>>> tag. Done with Python, need to implement it with Java and Go. >>>> 2. Remove pulling images before executing docker run command. This >>>> should be fixed for Python, Java and Go. >>>> >>>> Is this a blocker for 2.16? If so and above are too much work for 2.16 >>>> at the moment, we can hardcode the default tag for release branch for now. >>>> >>>> Using timestamp as a tag is an option as well, as long as runners know >>>> which timestamp they should use. >>>> >>>> Hannah >>>> >>>> On Wed, Oct 2, 2019 at 10:13 AM Alan Myrvold <amyrv...@google.com> >>>> wrote: >>>> >>>>> Yes, using the latest tag is problematic and can lead to unexpected >>>>> behavior. >>>>> Using a date/time or 2.17.0.dev-$USER tag would be better. The >>>>> validates container shell script uses a datetime >>>>> <https://github.com/apache/beam/blob/6551d0937ee31a8e310b63b222dbc750ec9331f8/sdks/python/container/run_validatescontainer.sh#L87> >>>>> tag, which allows a unique name if no two tests are run in the same >>>>> second. >>>>> >>>>> On Wed, Oct 2, 2019 at 10:05 AM Thomas Weise <t...@apache.org> wrote: >>>>> >>>>>> Want to bump this thread. >>>>>> >>>>>> If the current behavior is to replace locally built image with the >>>>>> last published, then this is not only unexpected for developers but also >>>>>> problematic for the CI, where tests should run against what was built >>>>>> from >>>>>> source. Or am I missing something? >>>>>> >>>>>> Thanks, >>>>>> Thomas >>>>>> >>>>>> >>>>>> On Tue, Sep 24, 2019 at 7:08 PM Thomas Weise <t...@apache.org> wrote: >>>>>> >>>>>>> Hi Hannah, >>>>>>> >>>>>>> I believe this is unexpected from the developer perspective. When >>>>>>> building something locally, we do expect that to be used. We may need to >>>>>>> change to not pull when the image is available locally, at least when >>>>>>> it is >>>>>>> a snapshot/master branch. Release images should be immutable anyways. >>>>>>> >>>>>>> Thomas >>>>>>> >>>>>>> >>>>>>> On Tue, Sep 24, 2019 at 4:13 PM Hannah Jiang <hannahji...@google.com> >>>>>>> wrote: >>>>>>> >>>>>>>> A minor update, with custom container, the pipeline would not fail, >>>>>>>> it throws out warning and moves on to `docker run` command. >>>>>>>> >>>>>>>> On Tue, Sep 24, 2019 at 4:05 PM Hannah Jiang < >>>>>>>> hannahji...@google.com> wrote: >>>>>>>> >>>>>>>>> Hi Brian >>>>>>>>> >>>>>>>>> If we pull docker images, it always downloads from remote >>>>>>>>> repository, which is expected behavior. >>>>>>>>> In case we want to run a local image and pull it only when the >>>>>>>>> image is not available at local, we can use `docker run` command >>>>>>>>> directly, >>>>>>>>> without pulling it in advance. [1] >>>>>>>>> In case we want to pull images only when they are not available at >>>>>>>>> local, we can use `docker images -q` to check if images are existing >>>>>>>>> at >>>>>>>>> local before pulling it. >>>>>>>>> Another option is re-tag your local image, pass your image to >>>>>>>>> pipeline and overwrite default one, but the code is still trying to >>>>>>>>> pull, >>>>>>>>> so if your image is not pushed to the remote repository, it would >>>>>>>>> fail. >>>>>>>>> >>>>>>>>> 1. https://github.com/docker/cli/pull/1498 >>>>>>>>> >>>>>>>>> Hannah >>>>>>>>> >>>>>>>>> On Tue, Sep 24, 2019 at 11:56 AM Brian Hulette < >>>>>>>>> bhule...@google.com> wrote: >>>>>>>>> >>>>>>>>>> I'm working on a demo cross-language pipeline on a local flink >>>>>>>>>> cluster that relies on my python row coder PR [1]. The PR includes >>>>>>>>>> some >>>>>>>>>> changes to the Java worker code, so I need to build a Java SDK >>>>>>>>>> container >>>>>>>>>> locally and use that in the pipeline. >>>>>>>>>> >>>>>>>>>> Unfortunately, whenever I run the pipeline, >>>>>>>>>> the apachebeam/java_sdk:latest tag is moved off of my locally built >>>>>>>>>> image >>>>>>>>>> to a newly downloaded image with a creation date 2 weeks ago, and >>>>>>>>>> that >>>>>>>>>> image is used instead. It looks like the reason is we run `docker >>>>>>>>>> pull` >>>>>>>>>> before running the container [2]. As the comment says this should be >>>>>>>>>> a >>>>>>>>>> no-op if the image already exists, but that doesn't seem to be the >>>>>>>>>> case. If >>>>>>>>>> I just run `docker pull apachebeam/java_sdk:latest` on my local >>>>>>>>>> machine it >>>>>>>>>> downloads the 2 week old image and happily informs me: >>>>>>>>>> >>>>>>>>>> Status: Downloaded newer image for apachebeam/java_sdk:latest >>>>>>>>>> >>>>>>>>>> Does anyone know how I can prevent `docker pull` from doing this? >>>>>>>>>> I can unblock myself for now just by commenting out the docker pull >>>>>>>>>> command, but I'd like to understand what is going on here. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Brian >>>>>>>>>> >>>>>>>>>> [1] https://github.com/apache/beam/pull/9188 >>>>>>>>>> [2] >>>>>>>>>> https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerCommand.java#L80 >>>>>>>>>> >>>>>>>>>