As developer in Beam, I expect something that I built locally to be used.
That's the case with Java and Python dependencies also.

These are -SNAPSHOT or .dev and distinct from releases. So perhaps this is
just a matter of correct version number management?

Users that consume our releases (version) should see the published Docker
images.

It would be great if that just works out of the box, w/o extra pipeline
options magic.

Thomas

On Wed, Nov 6, 2019 at 11:33 AM Valentyn Tymofieiev <valen...@google.com>
wrote:

> Anyway, I agree with Thomas that implicitly running `docker pull` is
>> confusing and requires some adjustments to work around. The user can always
>> run `docker pull` themselves if that's the intention.
>
>
> I understand that implicit pull may come across as surprising. However I
> see the  required adjustments as a better practice. I would argue that
> customized containers images should not reuse the same  name:tag
> combination, and it would also help the users avoid a situation where a
> runner may use a different container image in different execution
> environments.
> It may also help avoid issue where a user reports an issue with Beam, that
> others cannot reproduce only because a user was running a customized
> container on their local machine (and forgot about it).
> Also, if a users' pipeline is relies on a  container image released by
> Beam ( or maybe a third party), external updates to such container image
> may not propagate to the pipeline workflow without an explicit pull. Always
> pulling the image may help to ensure a more deterministic behavior.
>
> On Wed, Nov 6, 2019 at 10:38 AM Kyle Weaver <kcwea...@google.com> wrote:
>
>> Bumping this thread from the other one [1].
>>
>> > 1. Read sdk version from gradle.properties and use this as the default
>> tag. Done with Python, need to implement it with Java and Go.
>>
>> 100% agree with this one. Using the same tag for local and release images
>> has already caused a good deal of confusion. Filed BEAM-8570 and BEAM-8571
>> [2][3].
>>
>> > 2. Remove pulling images before executing docker run command. This
>> should be fixed for Python, Java and Go.
>>
>> Valentyn (from [1]):
>> > I think pulling the latest image for the current tag is actually a
>> desired behavior, in case the external image was updated (due to a bug fix
>> for example).
>>
>> There's a PR for this [4]. Once we fix the default tag for Java/Go
>> containers, the dev and release containers will be distinct, which makes it
>> seldom important whether or not the image is `docker pull`ed. Anyway, I
>> agree with Thomas that implicitly running `docker pull` is confusing and
>> requires some adjustments to work around. The user can always run `docker
>> pull` themselves if that's the intention.
>>
>> [1]
>> https://lists.apache.org/thread.html/0f2ccbbe7969b91dc21ba331c1a30d730e268cc0355c1ac1ba0b7988@%3Cdev.beam.apache.org%3E
>> [2] https://issues.apache.org/jira/browse/BEAM-8570
>> [3] https://issues.apache.org/jira/browse/BEAM-8571
>> [4] https://github.com/apache/beam/pull/9972
>>
>> On Wed, Oct 2, 2019 at 5:32 PM Ahmet Altay <al...@google.com> wrote:
>>
>>> I do not believe this is a blocker for Beam 2.16. I agree that it would
>>> be good to fix this.
>>>
>>> On Wed, Oct 2, 2019 at 3:15 PM Hannah Jiang <hannahji...@verily.com>
>>> wrote:
>>>
>>>> Hi Thomas
>>>>
>>>> Thanks for bring this up.
>>>>
>>>> Now Python uses sdk version as a default tag, while Java and Go use
>>>> latest as a default tag. I agree using latest as a tag is problematic. The
>>>> reason only Python uses sdk version as a default tag is Python has
>>>> version.py so the version is easy to read. For Java and Go, we need to read
>>>> it from gradle.properties when creating images with the default tag and
>>>> when setting the default image.
>>>>
>>>> Here is what we need to do:
>>>> 1. Read sdk version from gradle.properties and use this as the default
>>>> tag. Done with Python, need to implement it with Java and Go.
>>>> 2. Remove pulling images before executing docker run command. This
>>>> should be fixed for Python, Java and Go.
>>>>
>>>> Is this a blocker for 2.16? If so and above are too much work for 2.16
>>>> at the moment, we can hardcode the default tag for release branch for now.
>>>>
>>>> Using timestamp as a tag is an option as well, as long as runners know
>>>> which timestamp they should use.
>>>>
>>>> Hannah
>>>>
>>>> On Wed, Oct 2, 2019 at 10:13 AM Alan Myrvold <amyrv...@google.com>
>>>> wrote:
>>>>
>>>>> Yes, using the latest tag is problematic and can lead to unexpected
>>>>> behavior.
>>>>> Using a date/time or 2.17.0.dev-$USER tag would be better. The
>>>>> validates container shell script uses a datetime
>>>>> <https://github.com/apache/beam/blob/6551d0937ee31a8e310b63b222dbc750ec9331f8/sdks/python/container/run_validatescontainer.sh#L87>
>>>>> tag, which allows a unique name if no two tests are run in the same 
>>>>> second.
>>>>>
>>>>> On Wed, Oct 2, 2019 at 10:05 AM Thomas Weise <t...@apache.org> wrote:
>>>>>
>>>>>> Want to bump this thread.
>>>>>>
>>>>>> If the current behavior is to replace locally built image with the
>>>>>> last published, then this is not only unexpected for developers but also
>>>>>> problematic for the CI, where tests should run against what was built 
>>>>>> from
>>>>>> source. Or am I missing something?
>>>>>>
>>>>>> Thanks,
>>>>>> Thomas
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 24, 2019 at 7:08 PM Thomas Weise <t...@apache.org> wrote:
>>>>>>
>>>>>>> Hi Hannah,
>>>>>>>
>>>>>>> I believe this is unexpected from the developer perspective. When
>>>>>>> building something locally, we do expect that to be used. We may need to
>>>>>>> change to not pull when the image is available locally, at least when 
>>>>>>> it is
>>>>>>> a snapshot/master branch. Release images should be immutable anyways.
>>>>>>>
>>>>>>> Thomas
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Sep 24, 2019 at 4:13 PM Hannah Jiang <hannahji...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> A minor update, with custom container, the pipeline would not fail,
>>>>>>>> it throws out warning and moves on to `docker run` command.
>>>>>>>>
>>>>>>>> On Tue, Sep 24, 2019 at 4:05 PM Hannah Jiang <
>>>>>>>> hannahji...@google.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Brian
>>>>>>>>>
>>>>>>>>> If we pull docker images, it always downloads from remote
>>>>>>>>> repository, which is expected behavior.
>>>>>>>>> In case we want to run a local image and pull it only when the
>>>>>>>>> image is not available at local, we can use `docker run` command 
>>>>>>>>> directly,
>>>>>>>>> without pulling it in advance. [1]
>>>>>>>>> In case we want to pull images only when they are not available at
>>>>>>>>> local, we can use `docker images -q` to check if images are existing 
>>>>>>>>> at
>>>>>>>>> local before pulling it.
>>>>>>>>> Another option is re-tag your local image, pass your image to
>>>>>>>>> pipeline and overwrite default one, but the code is still trying to 
>>>>>>>>> pull,
>>>>>>>>> so if your image is not pushed to the remote repository, it would 
>>>>>>>>> fail.
>>>>>>>>>
>>>>>>>>> 1. https://github.com/docker/cli/pull/1498
>>>>>>>>>
>>>>>>>>> Hannah
>>>>>>>>>
>>>>>>>>> On Tue, Sep 24, 2019 at 11:56 AM Brian Hulette <
>>>>>>>>> bhule...@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> I'm working on a demo cross-language pipeline on a local flink
>>>>>>>>>> cluster that relies on my python row coder PR [1]. The PR includes 
>>>>>>>>>> some
>>>>>>>>>> changes to the Java worker code, so I need to build a Java SDK 
>>>>>>>>>> container
>>>>>>>>>> locally and use that in the pipeline.
>>>>>>>>>>
>>>>>>>>>> Unfortunately, whenever I run the pipeline,
>>>>>>>>>> the apachebeam/java_sdk:latest tag is moved off of my locally built 
>>>>>>>>>> image
>>>>>>>>>> to a newly downloaded image with a creation date 2 weeks ago, and 
>>>>>>>>>> that
>>>>>>>>>> image is used instead. It looks like the reason is we run `docker 
>>>>>>>>>> pull`
>>>>>>>>>> before running the container [2]. As the comment says this should be 
>>>>>>>>>> a
>>>>>>>>>> no-op if the image already exists, but that doesn't seem to be the 
>>>>>>>>>> case. If
>>>>>>>>>> I just run `docker pull apachebeam/java_sdk:latest` on my local 
>>>>>>>>>> machine it
>>>>>>>>>> downloads the 2 week old image and happily informs me:
>>>>>>>>>>
>>>>>>>>>> Status: Downloaded newer image for apachebeam/java_sdk:latest
>>>>>>>>>>
>>>>>>>>>> Does anyone know how I can prevent `docker pull` from doing this?
>>>>>>>>>> I can unblock myself for now just by commenting out the docker pull
>>>>>>>>>> command, but I'd like to understand what is going on here.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Brian
>>>>>>>>>>
>>>>>>>>>> [1] https://github.com/apache/beam/pull/9188
>>>>>>>>>> [2]
>>>>>>>>>> https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerCommand.java#L80
>>>>>>>>>>
>>>>>>>>>

Reply via email to