Hey all,

first of all a big thank you for driving many of the Docker image releases
in the last two years.

*(1) Moving docker-flink/docker-flink to apache/docker-flink*

+1 to do this as you outlined. I would propose to aim for a first
integration with the 1.10 release without major changes to the existing
Dockerfiles. The work items would be to move the Dockerfiles and update the
release process documentation so everyone is on the same page.

*(2) Consolidate Dockerfiles in apache/flink*

+1 to start the process for this. I think this requires a bit of thinking
about what the requirements are and which problems we want to solve. From
skimming the existing Dockerfiles, it seems to me that the Docker image
builds fulfil quite a few different tasks. We have a script that can bundle
Hadoop, can copy an existing Flink distribution, can include user jars,
etc. The scope of this is quite broad and would warrant a design document/a
FLIP.

I would move the questions about nightly builds, using a different base
image or having image variants with debug tooling to after (1) and (2) or
make it part of (2).

*(3) Next steps*

If there are no objections, I would propose to tackle (1) and (2) separate
and to continue as follows:

(i) Create tickets for (1) and aim to align with 1.10 release timeline
(ideally before the first RC). Since this does not touch any code in the
release branches, I think this would not be affected by the feature freeze.
The major work item would be to update the docs and potential refactorings
of the existing process and Dockerfiles. I can help with the process to
create a new repo.

(ii) Create first draft for consolidation of existing Dockerfiles. After
this proposal is done, I would propose to bring it up for a separate
discussion on the ML.


What do you think? @Patrick: would you be interested in working on both (1)
+ (2) or did you mainly have (1) in mind?

Best,

Ufuk

On Sun, Jan 12, 2020 at 8:30 PM Konstantin Knauf <konstan...@ververica.com>
wrote:

> Big +1 for
>
> * official images in a separate repository
> * unified images (session cluster vs application cluster)
> * images for development in Apache flink repository
>
> On Fri, Jan 10, 2020 at 7:14 PM Till Rohrmann <trohrm...@apache.org>
> wrote:
>
> > Thanks a lot for starting this discussion Patrick! I think it is a very
> > good idea to move Flink's docker image more under the jurisdiction of the
> > Flink PMC and to make it releasing new docker images part of Flink's
> > release process (not saying that we cannot release new docker images
> > independent of Flink's release cycle).
> >
> > One thing I have no strong opinion about is where to place the
> Dockerfiles
> > (apache/flink.git vs. apache/flink-docker.git). I see the point that one
> > wants to separate concerns (Flink code vs. Dockerfiles) and, hence, that
> > having separate repositories might help with this objective. But on the
> > other hand, I don't have a lot of experience with Docker Hub and how to
> > best host Dockerfiles. Consequently, it would be helpful if others who
> have
> > made some experience could share it with us.
> >
> > Cheers,
> > Till
> >
> > On Sat, Dec 21, 2019 at 2:28 PM Hequn Cheng <chenghe...@gmail.com>
> wrote:
> >
> > > Hi Patrick,
> > >
> > > Thanks a lot for your continued work on the Docker images. That’s
> really
> > > really a great job! And I have also benefited from it.
> > >
> > > Big +1 for integrating docker image publication into the Flink release
> > > process since we can leverage the Flink release process to make sure a
> > more
> > > legitimacy docker publication. We can also check and vote on it during
> > the
> > > release.
> > >
> > > I think the most import thing we need to discuss first is whether to
> > have a
> > > dedicated git repo for the Dockerfiles.
> > >
> > > Although it is convention shared by nearly every other “official” image
> > on
> > > Docker Hub to have a dedicated repo, I'm still not sure about it.
> Maybe I
> > > have missed something important. From my point of view, I think it’s
> > better
> > > to have the Dockerfiles in the (main)Flink repo.
> > >   - First, I think the Dockerfiles can be treated as part of the
> release.
> > > And it is also natural to put the corresponding version of the
> Dockerfile
> > > in the corresponding Flink release.
> > >   - Second, we can put the Dockerfiles in the path like
> > > flink/docker-flink/version/ and the version varies in different
> releases.
> > > For example, for release 1.8.3, we have a flink/docker-flink/1.8.3
> > > folder(or maybe flink/docker-flink/1.8). Even though all Dockerfiles
> for
> > > supported versions are not in one path but they are still in one Git
> tree
> > > with different refs.
> > >   - Third, it seems the Docker Hub also supports specifying different
> > refs.
> > > For the file[1], we can change the GitRepo link from
> > > https://github.com/docker-flink/docker-flink.git to
> > > https://github.com/apache/flink.git and add a GitFetch for each tag,
> > e.g.,
> > > GitFetch: refs/tags/release-1.8.3. There are some examples in the file
> of
> > > ubuntu[2].
> > >
> > > If the above assumptions are right and there are no more obstacles, I'm
> > > intended to have these Dockerfiles in the main Flink repo. In this
> case,
> > we
> > > can reduce the number of repos and reduce the management overhead.
> > > What do you think?
> > >
> > > Best,
> > > Hequn
> > >
> > > [1]
> > >
> >
> https://github.com/docker-library/official-images/blob/master/library/flink
> > > [2]
> > >
> > >
> >
> https://github.com/docker-library/official-images/blob/master/library/ubuntu
> > >
> > >
> > > On Fri, Dec 20, 2019 at 5:29 PM Yang Wang <danrtsey...@gmail.com>
> wrote:
> > >
> > > >  Big +1 for this effort.
> > > >
> > > > It is really exciting we have started this great work. More and more
> > > > companies start to
> > > > use Flink in container environment(docker, Kubernetes, Mesos, even
> > > > Yarn-3.x). So it is
> > > > very important that we could have unified official image building and
> > > > releasing process.
> > > >
> > > >
> > > > The image building process in this proposal is really good and i just
> > > have
> > > > the following thoughts.
> > > >
> > > > >> Keep a dedicated repo for Dockerfiles to build official image
> > > > I think this is a good way and we do not need to make some
> unnecessary
> > > > changes to Flink repository.
> > > >
> > > > >> Integrate building image into the Flink release process
> > > > It will bring a better experience for container environment users. In
> > my
> > > > opinion, a complete
> > > > release includes the official image. It should be verified to work
> > well.
> > > >
> > > > >> Nightly building
> > > > Do we support for all the release branch or just master branch?
> > > >
> > > > >> Multiple purpose Flink images
> > > > It is really indeed. In developing and testing process, we need some
> > > > profiling tools to help
> > > > us investigate some problems. Currently, we do not even have
> > jstack/jmap
> > > in
> > > > the image.
> > > >
> > > > >> Unify the Dockerfile in Flink repository
> > > > In the current code base, we have
> flink-contrib/docker-flink/Dockerfile
> > > to
> > > > build a image
> > > > for session cluster. However, it is not updated. For per-job cluster,
> > > > flink-container/docker/Dockerfile
> > > > could be used to build a flink image with user artifacts. I think we
> > need
> > > > to unify them and
> > > > provide a more powerful build script and entry point.
> > > >
> > > >
> > > >
> > > > Best,
> > > > Yang
> > > >
> > > > Patrick Lucas <patr...@ververica.com> 于2019年12月19日周四 下午9:20写道:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > >
> > > > > I would like to start a discussion about integrating publication of
> > the
> > > > > Flink Docker images hosted on Docker Hub[1] more tightly with the
> > Flink
> > > > > release process. Apologies in advance for the long post.
> > > > >
> > > > > More than two and a half years ago (time flies!) we introduced
> > > “official”
> > > > > Docker images for Flink[2]. Since then, the popularity of running
> > > > > containerized applications in general and containerized Flink in
> > > > particular
> > > > > has continued to grow. Today, Flink is one of the most popular
> > > “official”
> > > > > images on Docker Hub[3].
> > > > >
> > > > > > A graph of Flink Docker image pulls over time:
> > > > >
> > > > >
> > > >
> > >
> >
> https://gist.githubusercontent.com/patricklucas/7312444b1056ff82528e9a129e74e2b3/raw/9c8e139c1abc70b2b3fb34aadd7f44d46a540fe8/docker-flink-pulls.png
> > > > >
> > > > > “Official” is in quotation marks because while that’s how the
> Docker
> > > > > community refers to top-level images on Docker Hub (i.e. those that
> > can
> > > > be
> > > > > run with just <docker run foo>), they are not official in the sense
> > of
> > > > > being officially endorsed by the Flink PMC.
> > > > >
> > > > > I think it’s time for that to change.
> > > > >
> > > > > Currently, the Dockerfiles that produce these images are maintained
> > in
> > > a
> > > > > repository called docker-flink[4] in a separate, community-managed
> > > GitHub
> > > > > organization of the same name. When a new release of Flink is
> > > available,
> > > > or
> > > > > when other changes are necessary, these Dockerfiles—one per
> image—are
> > > > > updated, and then a pull request[5] is made to the Docker Hub
> > > > > official-images repo with an updated manifest of images and tags,
> > after
> > > > > which infrastructure run by Docker Hub builds, checks, and
> publishes
> > > the
> > > > > images.
> > > > >
> > > > > A question that has come up regularly is “Why are the Dockerfiles
> in
> > a
> > > > > separate repository from Flink?”, and there are a few different
> > > answers:
> > > > >
> > > > >    -
> > > > >
> > > > >    These Dockerfiles package only released, published distributions
> > of
> > > > >    Flink, and are therefore decoupled from a particular commit in
> the
> > > > Flink
> > > > >    repo
> > > > >    -
> > > > >
> > > > >    All the Dockerfiles for supported versions (and the
> corresponding
> > > > Scala
> > > > >    version variants) should be available in one Git tree for
> > > > > discoverability
> > > > >    -
> > > > >
> > > > >    The master branch of Flink is not the right place to encode what
> > the
> > > > >    supported versions are, or how to run previous versions of
> > Flink—it
> > > > > should
> > > > >    be concerned with the point-in-time of the code represented in
> > that
> > > > > commit
> > > > >
> > > > >
> > > > > But mostly, having a dedicated repo for Dockerfiles is a convention
> > > > shared
> > > > > by nearly every other “official” image on Docker Hub[6]. If the
> Flink
> > > > > community wants to do this differently, we will need to work with
> the
> > > > > Docker Hub maintainers to make sure we continue to work within
> their
> > > > > guidelines and expectations.
> > > > >
> > > > > While it seems intuitive that integrating these images into the
> Flink
> > > > > release process is a good thing, I don’t believe it is strictly
> > > > necessary,
> > > > > since the images only package approved and signed Flink releases,
> and
> > > do
> > > > > not themselves build Flink from source. However, there are some
> > > concrete
> > > > > advantages:
> > > > >
> > > > >    -
> > > > >
> > > > >    Putting the Docker images on (almost) equal footing with Flink
> > > binary
> > > > >    release artifacts will help the legitimacy of and user
> confidence
> > in
> > > > >    running Flink in containerized environments
> > > > >    -
> > > > >
> > > > >    By publishing release candidate (and possibly nightly) images,
> the
> > > > >    release testing and automated testing processes could be
> improved
> > > > >    -
> > > > >
> > > > >    The delay between Flink releases and when the corresponding
> Docker
> > > > >    images are available will be reduced
> > > > >
> > > > >
> > > > > Considering all of this, I propose the following:
> > > > >
> > > > >    -
> > > > >
> > > > >    We move the Git repository containing the Dockerfiles from the
> > > > >    docker-flink GitHub organization to Apache, placing it under
> > control
> > > > of
> > > > > the
> > > > >    Flink PMC
> > > > >    -
> > > > >
> > > > >    We codify updating these Dockerfiles and notifying Docker Hub
> into
> > > the
> > > > >    Flink release process
> > > > >    -
> > > > >
> > > > >       For release candidates, Dockerfiles should be added to a
> > special
> > > > >       directory which will be automatically built and pushed to the
> > > > > Apache Docker
> > > > >       Hub organization[7], e.g. apache/flink-rc:1.10.0-rc1
> > > > >       -
> > > > >
> > > > >       Upon release, the appropriate “release” Dockerfiles are added
> > > (e.g.
> > > > >       under the 1.10 directory) and release candidate Dockerfiles
> > > > removed,
> > > > > and
> > > > >       then a pull request opened on the
> > docker-library/official-images
> > > > > repository
> > > > >       -
> > > > >
> > > > >    Optionally, we introduce “nightly” builds, with an automated
> > process
> > > > >    building and pushing images to the Apache Docker Hub
> organization,
> > > > e.g.
> > > > >    apache/flink-dev:1.10-SNAPSHOT
> > > > >
> > > > >
> > > > > If we choose to move forward in this direction, there are some
> > further
> > > > > steps we could take to improve the experience of both developing
> and
> > > > using
> > > > > Flink with Docker (these are actually mostly orthogonal to the
> > proposed
> > > > > changes above, but I think this is a natural first step and should
> > make
> > > > the
> > > > > following ideas easier to implement).
> > > > >
> > > > > First, there are important differences between images meant for
> > running
> > > > > Flink and those meant for development: the former should strictly
> > > package
> > > > > only released distributions of software and be as thin of a layer
> as
> > > > > possible over the software itself, while the latter can be used
> > during
> > > > > development and testing, and can easily be rebuilt from a “working
> > > copy”
> > > > of
> > > > > the software’s source code.
> > > > >
> > > > > By standardizing on defining such “production” images in the
> > > docker-flink
> > > > > repository and “development” image(s) in the Flink repository
> itself,
> > > it
> > > > is
> > > > > much clearer to developers and users what the right Dockerfile or
> > image
> > > > > they should use for a given purpose. To that end, we could
> introduce
> > > one
> > > > or
> > > > > more documented Maven goals or Make targets for building a Docker
> > image
> > > > > from the current source tree or a specific release (including
> > > unreleased
> > > > or
> > > > > unsupported versions).
> > > > >
> > > > > Additionally, there has been discussion among Flink contributors
> for
> > > some
> > > > > time about the confusing state of Dockerfiles within the Flink
> > > > repository,
> > > > > each meant for a different way of running Flink. I’m not completely
> > up
> > > to
> > > > > speed about these different efforts, but we could possibly solve
> this
> > > by
> > > > > either building additional “official” images with different
> > entrypoints
> > > > for
> > > > > these various purposes, or by developing an improved entrypoint
> > script
> > > > that
> > > > > conveniently supports all cases. I defer to Till Rohrmann,
> Konstantin
> > > > > Knauf, or Stephan Ewen for further discussion on this point.
> > > > >
> > > > > I apologize again for the wall of text, but if you made it this
> far,
> > > > thank
> > > > > you! These improvements have been a long time coming, and I hope we
> > can
> > > > > find a solution that serves the Flink and Docker communities well.
> > > Please
> > > > > don’t hesitate to ask any questions.
> > > > >
> > > > > --
> > > > >
> > > > > Patrick Lucas
> > > > >
> > > > > [1] https://hub.docker.com/_/flink
> > > > >
> > > > > [2]
> > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/c50297f8659aaa59d4f2ae327b69c4d46d1ab8ecc53138e659e4fe91%40%3Cdev.flink.apache.org%3E
> > > > >
> > > > > [3] On page 2 at the time we went to press:
> > > > > https://hub.docker.com/search?q=&type=image&image_filter=official
> > > > >
> > > > > [4] https://github.com/docker-flink/docker-flink
> > > > >
> > > > > [5]
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/docker-library/official-images/pulls?q=is%3Apr+label%3Alibrary%2Fflink
> > > > >
> > > > > [6] I looked at the 25 most popular “official” images (see [3]) as
> > well
> > > > as
> > > > > “official” images of Apache software from the top 125; all use a
> > > > dedicated
> > > > > repo
> > > > > [7] https://hub.docker.com/u/apache
> > > > >
> > > >
> > >
> >
>
>
> --
>
> Konstantin Knauf | Solutions Architect
>
> +49 160 91394525
>
>
> Follow us @VervericaData Ververica <https://www.ververica.com/>
>
>
> --
>
> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> Conference
>
> Stream Processing | Event Driven | Real Time
>
> --
>
> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>
> --
> Ververica GmbH
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> (Tony) Cheng
>

Reply via email to