I do not believe this is a blocker for Beam 2.16. I agree that it would be
good to fix this.

On Wed, Oct 2, 2019 at 3:15 PM Hannah Jiang <[email protected]> wrote:

> Hi Thomas
>
> Thanks for bring this up.
>
> Now Python uses sdk version as a default tag, while Java and Go use latest
> as a default tag. I agree using latest as a tag is problematic. The reason
> only Python uses sdk version as a default tag is Python has version.py so
> the version is easy to read. For Java and Go, we need to read it from
> gradle.properties when creating images with the default tag and when
> setting the default image.
>
> Here is what we need to do:
> 1. Read sdk version from gradle.properties and use this as the default
> tag. Done with Python, need to implement it with Java and Go.
> 2. Remove pulling images before executing docker run command. This should
> be fixed for Python, Java and Go.
>
> Is this a blocker for 2.16? If so and above are too much work for 2.16 at
> the moment, we can hardcode the default tag for release branch for now.
>
> Using timestamp as a tag is an option as well, as long as runners know
> which timestamp they should use.
>
> Hannah
>
> On Wed, Oct 2, 2019 at 10:13 AM Alan Myrvold <[email protected]> wrote:
>
>> Yes, using the latest tag is problematic and can lead to unexpected
>> behavior.
>> Using a date/time or 2.17.0.dev-$USER tag would be better. The validates
>> container shell script uses a datetime
>> <https://github.com/apache/beam/blob/6551d0937ee31a8e310b63b222dbc750ec9331f8/sdks/python/container/run_validatescontainer.sh#L87>
>> tag, which allows a unique name if no two tests are run in the same second.
>>
>> On Wed, Oct 2, 2019 at 10:05 AM Thomas Weise <[email protected]> wrote:
>>
>>> Want to bump this thread.
>>>
>>> If the current behavior is to replace locally built image with the last
>>> published, then this is not only unexpected for developers but also
>>> problematic for the CI, where tests should run against what was built from
>>> source. Or am I missing something?
>>>
>>> Thanks,
>>> Thomas
>>>
>>>
>>> On Tue, Sep 24, 2019 at 7:08 PM Thomas Weise <[email protected]> wrote:
>>>
>>>> Hi Hannah,
>>>>
>>>> I believe this is unexpected from the developer perspective. When
>>>> building something locally, we do expect that to be used. We may need to
>>>> change to not pull when the image is available locally, at least when it is
>>>> a snapshot/master branch. Release images should be immutable anyways.
>>>>
>>>> Thomas
>>>>
>>>>
>>>> On Tue, Sep 24, 2019 at 4:13 PM Hannah Jiang <[email protected]>
>>>> wrote:
>>>>
>>>>> A minor update, with custom container, the pipeline would not fail, it
>>>>> throws out warning and moves on to `docker run` command.
>>>>>
>>>>> On Tue, Sep 24, 2019 at 4:05 PM Hannah Jiang <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Brian
>>>>>>
>>>>>> If we pull docker images, it always downloads from remote repository,
>>>>>> which is expected behavior.
>>>>>> In case we want to run a local image and pull it only when the image
>>>>>> is not available at local, we can use `docker run` command directly,
>>>>>> without pulling it in advance. [1]
>>>>>> In case we want to pull images only when they are not available at
>>>>>> local, we can use `docker images -q` to check if images are existing at
>>>>>> local before pulling it.
>>>>>> Another option is re-tag your local image, pass your image to
>>>>>> pipeline and overwrite default one, but the code is still trying to pull,
>>>>>> so if your image is not pushed to the remote repository, it would fail.
>>>>>>
>>>>>> 1. https://github.com/docker/cli/pull/1498
>>>>>>
>>>>>> Hannah
>>>>>>
>>>>>> On Tue, Sep 24, 2019 at 11:56 AM Brian Hulette <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> I'm working on a demo cross-language pipeline on a local flink
>>>>>>> cluster that relies on my python row coder PR [1]. The PR includes some
>>>>>>> changes to the Java worker code, so I need to build a Java SDK container
>>>>>>> locally and use that in the pipeline.
>>>>>>>
>>>>>>> Unfortunately, whenever I run the pipeline,
>>>>>>> the apachebeam/java_sdk:latest tag is moved off of my locally built 
>>>>>>> image
>>>>>>> to a newly downloaded image with a creation date 2 weeks ago, and that
>>>>>>> image is used instead. It looks like the reason is we run `docker pull`
>>>>>>> before running the container [2]. As the comment says this should be a
>>>>>>> no-op if the image already exists, but that doesn't seem to be the 
>>>>>>> case. If
>>>>>>> I just run `docker pull apachebeam/java_sdk:latest` on my local machine 
>>>>>>> it
>>>>>>> downloads the 2 week old image and happily informs me:
>>>>>>>
>>>>>>> Status: Downloaded newer image for apachebeam/java_sdk:latest
>>>>>>>
>>>>>>> Does anyone know how I can prevent `docker pull` from doing this? I
>>>>>>> can unblock myself for now just by commenting out the docker pull 
>>>>>>> command,
>>>>>>> but I'd like to understand what is going on here.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Brian
>>>>>>>
>>>>>>> [1] https://github.com/apache/beam/pull/9188
>>>>>>> [2]
>>>>>>> https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerCommand.java#L80
>>>>>>>
>>>>>>

Reply via email to