> These are -SNAPSHOT or .dev and distinct from releases. So perhaps this
is just a matter of correct version number management?

In light of the concerns Valentyn & Ahmet raise, it seems safer to change
tags instead of removing the pull. PR for Java:
https://github.com/apache/beam/pull/10017

On Wed, Nov 6, 2019 at 1:41 PM Ahmet Altay <[email protected]> wrote:

>
>
> On Wed, Nov 6, 2019 at 1:34 PM Valentyn Tymofieiev <[email protected]>
> wrote:
>
>> On Wed, Nov 6, 2019 at 11:48 AM Kyle Weaver <[email protected]> wrote:
>>
>>> The way the Python SDK currently does this is to use the version as the
>>> default tag, eg 2.16.0. While master uses 2.16.0.dev. This means there
>>> should never be any conflicts between a release and developer image, unless
>>> the user deliberately changes the image tags.
>>>
>>> > if a users' pipeline is relies on a  container image released by Beam
>>> ( or maybe a third party), external updates to such container image may not
>>> propagate to the pipeline workflow without an explicit pull
>>>
>>> There should only be one released container per release. Upgrades to a
>>> container image should not happen independently of the release process.
>>>
>>
>> Fair point, although we have not yet encountered issues requiring  an
>> update of a previously released Docker, so I would not rule out
>> considerations requiring us to re-release the image under the same tag. A
>> scenario that is possible today is multiple pushes of container image to
>> docker repo before the Beam release is finalized, so early adopters may be
>> affected by stale images without  a pull.
>>
>
> This is an interesting problem. It is true that adopters of RCs may get
> stuck pre-release candidates of those images. Could we still docker pull
> only if user is trying to use a default and released image tag?
>
>
>>
>>
>>> Note that so far I've just been discussing defaults. It's always
>>> possible to use a custom container using environment_config, as mentioned
>>> earlier.
>>>
>>
>> My understanding is that to pull or not to pull decision equally applies
>> to custom image provided by environment config.
>>
>>
>>> The goal is to make that unnecessary for most everyday use cases and
>>> development. Using different container images for different transforms is a
>>> more specialized use case worth a separate discussion.
>>>
>>> On Wed, Nov 6, 2019 at 11:33 AM Valentyn Tymofieiev <[email protected]>
>>> wrote:
>>>
>>>> Anyway, I agree with Thomas that implicitly running `docker pull` is
>>>>> confusing and requires some adjustments to work around. The user can 
>>>>> always
>>>>> run `docker pull` themselves if that's the intention.
>>>>
>>>>
>>>> I understand that implicit pull may come across as surprising. However
>>>> I see the  required adjustments as a better practice. I would argue that
>>>> customized containers images should not reuse the same  name:tag
>>>> combination, and it would also help the users avoid a situation where a
>>>> runner may use a different container image in different execution
>>>> environments.
>>>> It may also help avoid issue where a user reports an issue with Beam,
>>>> that others cannot reproduce only because a user was running a customized
>>>> container on their local machine (and forgot about it).
>>>> Also, if a users' pipeline is relies on a  container image released by
>>>> Beam ( or maybe a third party), external updates to such container image
>>>> may not propagate to the pipeline workflow without an explicit pull
>>>>
>>>>> > 1. Read sdk version from gradle.properties and use this as the
>>>>> default tag. Done with Python, need to implement it with Java and Go.
>>>>>
>>>>> 100% agree with this one. Using the same tag for local and release
>>>>> images has already caused a good deal of confusion. Filed BEAM-8570 and
>>>>> BEAM-8571 [2][3].
>>>>>
>>>>> > 2. Remove pulling images before executing docker run command. This
>>>>> should be fixed for Python, Java and Go.
>>>>>
>>>>> Valentyn (from [1]):
>>>>> > I think pulling the latest image for the current tag is actually a
>>>>> desired behavior, in case the external image was updated (due to a bug fix
>>>>> for example).
>>>>>
>>>>> There's a PR for this [4]. Once we fix the default tag for Java/Go
>>>>> containers, the dev and release containers will be distinct, which makes 
>>>>> it
>>>>> seldom important whether or not the image is `docker pull`ed. Anyway, I
>>>>> agree with Thomas that implicitly running `docker pull` is confusing and
>>>>> requires some adjustments to work around. The user can always run `docker
>>>>> pull` themselves if that's the intention.
>>>>>
>>>>> [1]
>>>>> https://lists.apache.org/thread.html/0f2ccbbe7969b91dc21ba331c1a30d730e268cc0355c1ac1ba0b7988@%3Cdev.beam.apache.org%3E
>>>>> [2] https://issues.apache.org/jira/browse/BEAM-8570
>>>>> [3] https://issues.apache.org/jira/browse/BEAM-8571
>>>>> [4] https://github.com/apache/beam/pull/9972
>>>>>
>>>>> On Wed, Oct 2, 2019 at 5:32 PM Ahmet Altay <[email protected]> wrote:
>>>>>
>>>>>> I do not believe this is a blocker for Beam 2.16. I agree that it
>>>>>> would be good to fix this.
>>>>>>
>>>>>> On Wed, Oct 2, 2019 at 3:15 PM Hannah Jiang <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Thomas
>>>>>>>
>>>>>>> Thanks for bring this up.
>>>>>>>
>>>>>>> Now Python uses sdk version as a default tag, while Java and Go use
>>>>>>> latest as a default tag. I agree using latest as a tag is problematic. 
>>>>>>> The
>>>>>>> reason only Python uses sdk version as a default tag is Python has
>>>>>>> version.py so the version is easy to read. For Java and Go, we need to 
>>>>>>> read
>>>>>>> it from gradle.properties when creating images with the default tag and
>>>>>>> when setting the default image.
>>>>>>>
>>>>>>> Here is what we need to do:
>>>>>>> 1. Read sdk version from gradle.properties and use this as the
>>>>>>> default tag. Done with Python, need to implement it with Java and Go.
>>>>>>> 2. Remove pulling images before executing docker run command. This
>>>>>>> should be fixed for Python, Java and Go.
>>>>>>>
>>>>>>> Is this a blocker for 2.16? If so and above are too much work for
>>>>>>> 2.16 at the moment, we can hardcode the default tag for release branch 
>>>>>>> for
>>>>>>> now.
>>>>>>>
>>>>>>> Using timestamp as a tag is an option as well, as long as runners
>>>>>>> know which timestamp they should use.
>>>>>>>
>>>>>>> Hannah
>>>>>>>
>>>>>>> On Wed, Oct 2, 2019 at 10:13 AM Alan Myrvold <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Yes, using the latest tag is problematic and can lead to unexpected
>>>>>>>> behavior.
>>>>>>>> Using a date/time or 2.17.0.dev-$USER tag would be better. The
>>>>>>>> validates container shell script uses a datetime
>>>>>>>> <https://github.com/apache/beam/blob/6551d0937ee31a8e310b63b222dbc750ec9331f8/sdks/python/container/run_validatescontainer.sh#L87>
>>>>>>>> tag, which allows a unique name if no two tests are run in the same 
>>>>>>>> second.
>>>>>>>>
>>>>>>>> On Wed, Oct 2, 2019 at 10:05 AM Thomas Weise <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Want to bump this thread.
>>>>>>>>>
>>>>>>>>> If the current behavior is to replace locally built image with the
>>>>>>>>> last published, then this is not only unexpected for developers but 
>>>>>>>>> also
>>>>>>>>> problematic for the CI, where tests should run against what was built 
>>>>>>>>> from
>>>>>>>>> source. Or am I missing something?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Thomas
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Sep 24, 2019 at 7:08 PM Thomas Weise <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Hannah,
>>>>>>>>>>
>>>>>>>>>> I believe this is unexpected from the developer perspective. When
>>>>>>>>>> building something locally, we do expect that to be used. We may 
>>>>>>>>>> need to
>>>>>>>>>> change to not pull when the image is available locally, at least 
>>>>>>>>>> when it is
>>>>>>>>>> a snapshot/master branch. Release images should be immutable anyways.
>>>>>>>>>>
>>>>>>>>>> Thomas
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Sep 24, 2019 at 4:13 PM Hannah Jiang <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> A minor update, with custom container, the pipeline would not
>>>>>>>>>>> fail, it throws out warning and moves on to `docker run` command.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Sep 24, 2019 at 4:05 PM Hannah Jiang <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Brian
>>>>>>>>>>>>
>>>>>>>>>>>> If we pull docker images, it always downloads from remote
>>>>>>>>>>>> repository, which is expected behavior.
>>>>>>>>>>>> In case we want to run a local image and pull it only when the
>>>>>>>>>>>> image is not available at local, we can use `docker run` command 
>>>>>>>>>>>> directly,
>>>>>>>>>>>> without pulling it in advance. [1]
>>>>>>>>>>>> In case we want to pull images only when they are not
>>>>>>>>>>>> available at local, we can use `docker images -q` to check if 
>>>>>>>>>>>> images are
>>>>>>>>>>>> existing at local before pulling it.
>>>>>>>>>>>> Another option is re-tag your local image, pass your image to
>>>>>>>>>>>> pipeline and overwrite default one, but the code is still trying 
>>>>>>>>>>>> to pull,
>>>>>>>>>>>> so if your image is not pushed to the remote repository, it would 
>>>>>>>>>>>> fail.
>>>>>>>>>>>>
>>>>>>>>>>>> 1. https://github.com/docker/cli/pull/1498
>>>>>>>>>>>>
>>>>>>>>>>>> Hannah
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Sep 24, 2019 at 11:56 AM Brian Hulette <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I'm working on a demo cross-language pipeline on a local flink
>>>>>>>>>>>>> cluster that relies on my python row coder PR [1]. The PR 
>>>>>>>>>>>>> includes some
>>>>>>>>>>>>> changes to the Java worker code, so I need to build a Java SDK 
>>>>>>>>>>>>> container
>>>>>>>>>>>>> locally and use that in the pipeline.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Unfortunately, whenever I run the pipeline,
>>>>>>>>>>>>> the apachebeam/java_sdk:latest tag is moved off of my locally 
>>>>>>>>>>>>> built image
>>>>>>>>>>>>> to a newly downloaded image with a creation date 2 weeks ago, and 
>>>>>>>>>>>>> that
>>>>>>>>>>>>> image is used instead. It looks like the reason is we run `docker 
>>>>>>>>>>>>> pull`
>>>>>>>>>>>>> before running the container [2]. As the comment says this should 
>>>>>>>>>>>>> be a
>>>>>>>>>>>>> no-op if the image already exists, but that doesn't seem to be 
>>>>>>>>>>>>> the case. If
>>>>>>>>>>>>> I just run `docker pull apachebeam/java_sdk:latest` on my local 
>>>>>>>>>>>>> machine it
>>>>>>>>>>>>> downloads the 2 week old image and happily informs me:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Status: Downloaded newer image for apachebeam/java_sdk:latest
>>>>>>>>>>>>>
>>>>>>>>>>>>> Does anyone know how I can prevent `docker pull` from doing
>>>>>>>>>>>>> this? I can unblock myself for now just by commenting out the 
>>>>>>>>>>>>> docker pull
>>>>>>>>>>>>> command, but I'd like to understand what is going on here.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Brian
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1] https://github.com/apache/beam/pull/9188
>>>>>>>>>>>>> [2]
>>>>>>>>>>>>> https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerCommand.java#L80
>>>>>>>>>>>>>
>>>>>>>>>>>>

Reply via email to