Hi Team
Thanks for all the comments about beam containers.
After considering various opinions and investigating gcr and docker hub, we
decided to push images to docker hub.
Each image will have two tags, {version}_rc and {version}. {version} tag
will be added after the release candidate image is verified.
Meanwhile, we will have* latest* tag for each repository, which always
points to the most recent verified release image, so users can pull it by
default.
Docker hub doesn't support leveled repository, which means we should follow
*repository:tag* format.
it's too general if we use {language_version} as repository for SDK images.
(version is added when we support multiple versions.)
So I would like to include *sdk* to repository. Images generated at local
will also have the same name.
Here are some examples:
- python2.7_sdk:2.15.0
- java_sdk:2.15.0_rc
- go_sdk:latest
I will proceed with this format if there is no strong opposition by
tomorrow noon(PST).
*To PMC members*:
Permission control will follow the pypi model. All interested PMC members
will be added as admins and release managers will be granted push
permission.
Please let me know your *docker id* if you want to be added as an admin.
Thanks,
Hannah
On Wed, Sep 4, 2019 at 3:47 PM Thomas Weise <[email protected]> wrote:
> This will greatly simplify trying out portable runners:
> https://beam.apache.org/documentation/runners/flink/#executing-a-beam-pipeline-on-a-flink-cluster
>
> Can't wait for following to disappear from the instructions page: ./gradlew
> :sdks:python:container:docker
>
> On Wed, Sep 4, 2019 at 3:35 PM Thomas Weise <[email protected]> wrote:
>
>> Awesome, thank you!
>>
>>
>> On Wed, Sep 4, 2019 at 3:22 PM Hannah Jiang <[email protected]>
>> wrote:
>>
>>> Hi Thomas
>>>
>>> I created snapshot images from head as of around 2PM today.
>>> You can pull images from gcr.io/apache-beam-testing/beam/sdks/snapshot.
>>>
>>> Thanks,
>>> Hannah
>>>
>>> On Wed, Sep 4, 2019 at 1:41 PM Thomas Weise <[email protected]> wrote:
>>>
>>>> Hi Hannah,
>>>>
>>>> Thank you, I know how to build the containers locally, but not how to
>>>> publish them!
>>>>
>>>> The cwiki says "Publishing images to gcr.io/beam requires permissions
>>>> in apache-beam-testing project."
>>>>
>>>> Can I get access to the testing project (at least temporarily) and what
>>>> would I need to setup to run the publish target that is shown on cwiki?
>>>>
>>>> Thanks,
>>>> Thomas
>>>>
>>>>
>>>> On Wed, Sep 4, 2019 at 11:06 AM Hannah Jiang <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Thomas
>>>>>
>>>>> I haven't uploaded any snapshot images yet. Here is how you can create
>>>>> one from head.
>>>>> > cd [...]/beam/
>>>>> # For Python
>>>>> > ./gradlew :sdks:python:container:py{version}:docker *where version
>>>>> is {2,35,36,37}*
>>>>> # For Java
>>>>> > ./gradlew -p sdks/java/container docker
>>>>> # For Go
>>>>> > ./gradlew -p sdks/go/container docker
>>>>>
>>>>> The 2.15 one is just for testing, not a real 2.15.0, nor a snapshot
>>>>> from head.
>>>>>
>>>>> Please let me know if you have any questions.
>>>>> Hannah
>>>>>
>>>>> On Wed, Sep 4, 2019 at 10:57 AM Thomas Weise <[email protected]> wrote:
>>>>>
>>>>>> I actually found something in [1], but it is 2.15 unfortunately.
>>>>>>
>>>>>> [1]
>>>>>> https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release/python2.7?gcrImageListsize=30
>>>>>>
>>>>>> On Wed, Sep 4, 2019 at 10:35 AM Thomas Weise <[email protected]> wrote:
>>>>>>
>>>>>>> Thanks for working on this. Do you happen to have publicly
>>>>>>> accessible snapshots published for your testing currently (even when the
>>>>>>> final location isn't sorted out)?
>>>>>>>
>>>>>>> I would like to use a 2.16 based Python SDK image for working on my
>>>>>>> downstream project, but could not find anything in
>>>>>>> gcr.io/apache-beam-testing/beam/sdks/rc/snapshot
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Thomas
>>>>>>>
>>>>>>> On Fri, Aug 30, 2019 at 10:56 AM Valentyn Tymofieiev <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> On Tue, Aug 27, 2019 at 3:35 PM Hannah Jiang <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi team
>>>>>>>>>
>>>>>>>>> I am working on improving docker container support for Beam. We
>>>>>>>>> would like to publish prebuilt containers for each release version and
>>>>>>>>> daily snapshot. Current work focuses on release images only and it
>>>>>>>>> would be
>>>>>>>>> part of the release process.
>>>>>>>>>
>>>>>>>>> The release images will be pushed to GCR which is publicly
>>>>>>>>> accessible(pullable). We will use the following locations.
>>>>>>>>> *Repository*: gcr.io/beam
>>>>>>>>> *Project*: apache-beam-testing
>>>>>>>>> More details, including naming and tagging scheme, can be found at
>>>>>>>>> wiki
>>>>>>>>> <https://cwiki.apache.org/confluence/display/BEAM/%5BWIP%5D+SDKHarness+Container+Image+Release+Process>
>>>>>>>>> which
>>>>>>>>> is written by several contributors.
>>>>>>>>>
>>>>>>>>> I would like to discuss these two questions.
>>>>>>>>> *1. How many tests do we need to run before pushing images to gcr*
>>>>>>>>> ?
>>>>>>>>> Publishing artifacts is the last step of the release process, so
>>>>>>>>> at this moment, we already verified all codebase. In addition, many
>>>>>>>>> Jenkins
>>>>>>>>> tests use containers, so it is already verified several times. Do we
>>>>>>>>> need
>>>>>>>>> to run it again?
>>>>>>>>>
>>>>>>>>
>>>>>>>> In a docker repository, one container image can have multiple tags.
>>>>>>>> One possibility is that on the last step of the release process, after
>>>>>>>> sufficient testing, we place a production tag on an image that was
>>>>>>>> already
>>>>>>>> pushed with a dev tag.
>>>>>>>>
>>>>>>>> For example a dev tag may look like:
>>>>>>>> gcr.io/apache-beam/python37:2.16.0-RC4, and production tag may
>>>>>>>> look like:
>>>>>>>> gcr.io/apache-beam/python37:2.16.0 and both will refer to the same
>>>>>>>> image at the end.
>>>>>>>>
>>>>>>>> We should also plan what the process of updating the container
>>>>>>>> image will look like, if we need to release the image with
>>>>>>>> additional changes, and how we will test these changes before the final
>>>>>>>> push (or placing production tag).
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> *2. How many tests do we need to run to validate pushed images?*
>>>>>>>>> When we push the images, we assume the images would work and pass
>>>>>>>>> all the tests. After pushing, we should confirm the images are
>>>>>>>>> pullable and
>>>>>>>>> useable. I suggest we run several tests on dataflow with each pushed
>>>>>>>>> image.
>>>>>>>>> What do you think?
>>>>>>>>>
>>>>>>>>
>>>>>>>> I think it makes sense to do - Beam runners that use SDK container
>>>>>>>> images should have some continuously running tests, which periodically
>>>>>>>> check that all supported images are pullable and still compatible
>>>>>>>> with the
>>>>>>>> runner.
>>>>>>>>
>>>>>>>> This work can be refined later as we explore more during our
>>>>>>>>> release process.
>>>>>>>>> Please comment or edit the wiki page or reply to this email with
>>>>>>>>> your opinions.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Hannah
>>>>>>>>>
>>>>>>>>