The way the Python SDK currently does this is to use the version as the
default tag, eg 2.16.0. While master uses 2.16.0.dev. This means there
should never be any conflicts between a release and developer image, unless
the user deliberately changes the image tags.

> if a users' pipeline is relies on a  container image released by Beam (
or maybe a third party), external updates to such container image may not
propagate to the pipeline workflow without an explicit pull

There should only be one released container per release. Upgrades to a
container image should not happen independently of the release process.

Note that so far I've just been discussing defaults. It's always possible
to use a custom container using environment_config, as mentioned earlier.
The goal is to make that unnecessary for most everyday use cases and
development. Using different container images for different transforms is a
more specialized use case worth a separate discussion.

On Wed, Nov 6, 2019 at 11:33 AM Valentyn Tymofieiev <[email protected]>
wrote:

> Anyway, I agree with Thomas that implicitly running `docker pull` is
>> confusing and requires some adjustments to work around. The user can always
>> run `docker pull` themselves if that's the intention.
>
>
> I understand that implicit pull may come across as surprising. However I
> see the  required adjustments as a better practice. I would argue that
> customized containers images should not reuse the same  name:tag
> combination, and it would also help the users avoid a situation where a
> runner may use a different container image in different execution
> environments.
> It may also help avoid issue where a user reports an issue with Beam, that
> others cannot reproduce only because a user was running a customized
> container on their local machine (and forgot about it).
> Also, if a users' pipeline is relies on a  container image released by
> Beam ( or maybe a third party), external updates to such container image
> may not propagate to the pipeline workflow without an explicit pull
>
>> > 1. Read sdk version from gradle.properties and use this as the default
>> tag. Done with Python, need to implement it with Java and Go.
>>
>> 100% agree with this one. Using the same tag for local and release images
>> has already caused a good deal of confusion. Filed BEAM-8570 and BEAM-8571
>> [2][3].
>>
>> > 2. Remove pulling images before executing docker run command. This
>> should be fixed for Python, Java and Go.
>>
>> Valentyn (from [1]):
>> > I think pulling the latest image for the current tag is actually a
>> desired behavior, in case the external image was updated (due to a bug fix
>> for example).
>>
>> There's a PR for this [4]. Once we fix the default tag for Java/Go
>> containers, the dev and release containers will be distinct, which makes it
>> seldom important whether or not the image is `docker pull`ed. Anyway, I
>> agree with Thomas that implicitly running `docker pull` is confusing and
>> requires some adjustments to work around. The user can always run `docker
>> pull` themselves if that's the intention.
>>
>> [1]
>> https://lists.apache.org/thread.html/0f2ccbbe7969b91dc21ba331c1a30d730e268cc0355c1ac1ba0b7988@%3Cdev.beam.apache.org%3E
>> [2] https://issues.apache.org/jira/browse/BEAM-8570
>> [3] https://issues.apache.org/jira/browse/BEAM-8571
>> [4] https://github.com/apache/beam/pull/9972
>>
>> On Wed, Oct 2, 2019 at 5:32 PM Ahmet Altay <[email protected]> wrote:
>>
>>> I do not believe this is a blocker for Beam 2.16. I agree that it would
>>> be good to fix this.
>>>
>>> On Wed, Oct 2, 2019 at 3:15 PM Hannah Jiang <[email protected]>
>>> wrote:
>>>
>>>> Hi Thomas
>>>>
>>>> Thanks for bring this up.
>>>>
>>>> Now Python uses sdk version as a default tag, while Java and Go use
>>>> latest as a default tag. I agree using latest as a tag is problematic. The
>>>> reason only Python uses sdk version as a default tag is Python has
>>>> version.py so the version is easy to read. For Java and Go, we need to read
>>>> it from gradle.properties when creating images with the default tag and
>>>> when setting the default image.
>>>>
>>>> Here is what we need to do:
>>>> 1. Read sdk version from gradle.properties and use this as the default
>>>> tag. Done with Python, need to implement it with Java and Go.
>>>> 2. Remove pulling images before executing docker run command. This
>>>> should be fixed for Python, Java and Go.
>>>>
>>>> Is this a blocker for 2.16? If so and above are too much work for 2.16
>>>> at the moment, we can hardcode the default tag for release branch for now.
>>>>
>>>> Using timestamp as a tag is an option as well, as long as runners know
>>>> which timestamp they should use.
>>>>
>>>> Hannah
>>>>
>>>> On Wed, Oct 2, 2019 at 10:13 AM Alan Myrvold <[email protected]>
>>>> wrote:
>>>>
>>>>> Yes, using the latest tag is problematic and can lead to unexpected
>>>>> behavior.
>>>>> Using a date/time or 2.17.0.dev-$USER tag would be better. The
>>>>> validates container shell script uses a datetime
>>>>> <https://github.com/apache/beam/blob/6551d0937ee31a8e310b63b222dbc750ec9331f8/sdks/python/container/run_validatescontainer.sh#L87>
>>>>> tag, which allows a unique name if no two tests are run in the same 
>>>>> second.
>>>>>
>>>>> On Wed, Oct 2, 2019 at 10:05 AM Thomas Weise <[email protected]> wrote:
>>>>>
>>>>>> Want to bump this thread.
>>>>>>
>>>>>> If the current behavior is to replace locally built image with the
>>>>>> last published, then this is not only unexpected for developers but also
>>>>>> problematic for the CI, where tests should run against what was built 
>>>>>> from
>>>>>> source. Or am I missing something?
>>>>>>
>>>>>> Thanks,
>>>>>> Thomas
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 24, 2019 at 7:08 PM Thomas Weise <[email protected]> wrote:
>>>>>>
>>>>>>> Hi Hannah,
>>>>>>>
>>>>>>> I believe this is unexpected from the developer perspective. When
>>>>>>> building something locally, we do expect that to be used. We may need to
>>>>>>> change to not pull when the image is available locally, at least when 
>>>>>>> it is
>>>>>>> a snapshot/master branch. Release images should be immutable anyways.
>>>>>>>
>>>>>>> Thomas
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Sep 24, 2019 at 4:13 PM Hannah Jiang <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> A minor update, with custom container, the pipeline would not fail,
>>>>>>>> it throws out warning and moves on to `docker run` command.
>>>>>>>>
>>>>>>>> On Tue, Sep 24, 2019 at 4:05 PM Hannah Jiang <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi Brian
>>>>>>>>>
>>>>>>>>> If we pull docker images, it always downloads from remote
>>>>>>>>> repository, which is expected behavior.
>>>>>>>>> In case we want to run a local image and pull it only when the
>>>>>>>>> image is not available at local, we can use `docker run` command 
>>>>>>>>> directly,
>>>>>>>>> without pulling it in advance. [1]
>>>>>>>>> In case we want to pull images only when they are not available at
>>>>>>>>> local, we can use `docker images -q` to check if images are existing 
>>>>>>>>> at
>>>>>>>>> local before pulling it.
>>>>>>>>> Another option is re-tag your local image, pass your image to
>>>>>>>>> pipeline and overwrite default one, but the code is still trying to 
>>>>>>>>> pull,
>>>>>>>>> so if your image is not pushed to the remote repository, it would 
>>>>>>>>> fail.
>>>>>>>>>
>>>>>>>>> 1. https://github.com/docker/cli/pull/1498
>>>>>>>>>
>>>>>>>>> Hannah
>>>>>>>>>
>>>>>>>>> On Tue, Sep 24, 2019 at 11:56 AM Brian Hulette <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> I'm working on a demo cross-language pipeline on a local flink
>>>>>>>>>> cluster that relies on my python row coder PR [1]. The PR includes 
>>>>>>>>>> some
>>>>>>>>>> changes to the Java worker code, so I need to build a Java SDK 
>>>>>>>>>> container
>>>>>>>>>> locally and use that in the pipeline.
>>>>>>>>>>
>>>>>>>>>> Unfortunately, whenever I run the pipeline,
>>>>>>>>>> the apachebeam/java_sdk:latest tag is moved off of my locally built 
>>>>>>>>>> image
>>>>>>>>>> to a newly downloaded image with a creation date 2 weeks ago, and 
>>>>>>>>>> that
>>>>>>>>>> image is used instead. It looks like the reason is we run `docker 
>>>>>>>>>> pull`
>>>>>>>>>> before running the container [2]. As the comment says this should be 
>>>>>>>>>> a
>>>>>>>>>> no-op if the image already exists, but that doesn't seem to be the 
>>>>>>>>>> case. If
>>>>>>>>>> I just run `docker pull apachebeam/java_sdk:latest` on my local 
>>>>>>>>>> machine it
>>>>>>>>>> downloads the 2 week old image and happily informs me:
>>>>>>>>>>
>>>>>>>>>> Status: Downloaded newer image for apachebeam/java_sdk:latest
>>>>>>>>>>
>>>>>>>>>> Does anyone know how I can prevent `docker pull` from doing this?
>>>>>>>>>> I can unblock myself for now just by commenting out the docker pull
>>>>>>>>>> command, but I'd like to understand what is going on here.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Brian
>>>>>>>>>>
>>>>>>>>>> [1] https://github.com/apache/beam/pull/9188
>>>>>>>>>> [2]
>>>>>>>>>> https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerCommand.java#L80
>>>>>>>>>>
>>>>>>>>>

Reply via email to