Hi Patrick,

Thanks a lot for your continued work on the Docker images. That’s really
really a great job! And I have also benefited from it.

Big +1 for integrating docker image publication into the Flink release
process since we can leverage the Flink release process to make sure a more
legitimacy docker publication. We can also check and vote on it during the
release.

I think the most import thing we need to discuss first is whether to have a
dedicated git repo for the Dockerfiles.

Although it is convention shared by nearly every other “official” image on
Docker Hub to have a dedicated repo, I'm still not sure about it. Maybe I
have missed something important. From my point of view, I think it’s better
to have the Dockerfiles in the (main)Flink repo.
  - First, I think the Dockerfiles can be treated as part of the release.
And it is also natural to put the corresponding version of the Dockerfile
in the corresponding Flink release.
  - Second, we can put the Dockerfiles in the path like
flink/docker-flink/version/ and the version varies in different releases.
For example, for release 1.8.3, we have a flink/docker-flink/1.8.3
folder(or maybe flink/docker-flink/1.8). Even though all Dockerfiles for
supported versions are not in one path but they are still in one Git tree
with different refs.
  - Third, it seems the Docker Hub also supports specifying different refs.
For the file[1], we can change the GitRepo link from
https://github.com/docker-flink/docker-flink.git to
https://github.com/apache/flink.git and add a GitFetch for each tag, e.g.,
GitFetch: refs/tags/release-1.8.3. There are some examples in the file of
ubuntu[2].

If the above assumptions are right and there are no more obstacles, I'm
intended to have these Dockerfiles in the main Flink repo. In this case, we
can reduce the number of repos and reduce the management overhead.
What do you think?

Best,
Hequn

[1]
https://github.com/docker-library/official-images/blob/master/library/flink
[2]
https://github.com/docker-library/official-images/blob/master/library/ubuntu


On Fri, Dec 20, 2019 at 5:29 PM Yang Wang <danrtsey...@gmail.com> wrote:

>  Big +1 for this effort.
>
> It is really exciting we have started this great work. More and more
> companies start to
> use Flink in container environment(docker, Kubernetes, Mesos, even
> Yarn-3.x). So it is
> very important that we could have unified official image building and
> releasing process.
>
>
> The image building process in this proposal is really good and i just have
> the following thoughts.
>
> >> Keep a dedicated repo for Dockerfiles to build official image
> I think this is a good way and we do not need to make some unnecessary
> changes to Flink repository.
>
> >> Integrate building image into the Flink release process
> It will bring a better experience for container environment users. In my
> opinion, a complete
> release includes the official image. It should be verified to work well.
>
> >> Nightly building
> Do we support for all the release branch or just master branch?
>
> >> Multiple purpose Flink images
> It is really indeed. In developing and testing process, we need some
> profiling tools to help
> us investigate some problems. Currently, we do not even have jstack/jmap in
> the image.
>
> >> Unify the Dockerfile in Flink repository
> In the current code base, we have flink-contrib/docker-flink/Dockerfile to
> build a image
> for session cluster. However, it is not updated. For per-job cluster,
> flink-container/docker/Dockerfile
> could be used to build a flink image with user artifacts. I think we need
> to unify them and
> provide a more powerful build script and entry point.
>
>
>
> Best,
> Yang
>
> Patrick Lucas <patr...@ververica.com> 于2019年12月19日周四 下午9:20写道:
>
> > Hi everyone,
> >
> >
> > I would like to start a discussion about integrating publication of the
> > Flink Docker images hosted on Docker Hub[1] more tightly with the Flink
> > release process. Apologies in advance for the long post.
> >
> > More than two and a half years ago (time flies!) we introduced “official”
> > Docker images for Flink[2]. Since then, the popularity of running
> > containerized applications in general and containerized Flink in
> particular
> > has continued to grow. Today, Flink is one of the most popular “official”
> > images on Docker Hub[3].
> >
> > > A graph of Flink Docker image pulls over time:
> >
> >
> https://gist.githubusercontent.com/patricklucas/7312444b1056ff82528e9a129e74e2b3/raw/9c8e139c1abc70b2b3fb34aadd7f44d46a540fe8/docker-flink-pulls.png
> >
> > “Official” is in quotation marks because while that’s how the Docker
> > community refers to top-level images on Docker Hub (i.e. those that can
> be
> > run with just <docker run foo>), they are not official in the sense of
> > being officially endorsed by the Flink PMC.
> >
> > I think it’s time for that to change.
> >
> > Currently, the Dockerfiles that produce these images are maintained in a
> > repository called docker-flink[4] in a separate, community-managed GitHub
> > organization of the same name. When a new release of Flink is available,
> or
> > when other changes are necessary, these Dockerfiles—one per image—are
> > updated, and then a pull request[5] is made to the Docker Hub
> > official-images repo with an updated manifest of images and tags, after
> > which infrastructure run by Docker Hub builds, checks, and publishes the
> > images.
> >
> > A question that has come up regularly is “Why are the Dockerfiles in a
> > separate repository from Flink?”, and there are a few different answers:
> >
> >    -
> >
> >    These Dockerfiles package only released, published distributions of
> >    Flink, and are therefore decoupled from a particular commit in the
> Flink
> >    repo
> >    -
> >
> >    All the Dockerfiles for supported versions (and the corresponding
> Scala
> >    version variants) should be available in one Git tree for
> > discoverability
> >    -
> >
> >    The master branch of Flink is not the right place to encode what the
> >    supported versions are, or how to run previous versions of Flink—it
> > should
> >    be concerned with the point-in-time of the code represented in that
> > commit
> >
> >
> > But mostly, having a dedicated repo for Dockerfiles is a convention
> shared
> > by nearly every other “official” image on Docker Hub[6]. If the Flink
> > community wants to do this differently, we will need to work with the
> > Docker Hub maintainers to make sure we continue to work within their
> > guidelines and expectations.
> >
> > While it seems intuitive that integrating these images into the Flink
> > release process is a good thing, I don’t believe it is strictly
> necessary,
> > since the images only package approved and signed Flink releases, and do
> > not themselves build Flink from source. However, there are some concrete
> > advantages:
> >
> >    -
> >
> >    Putting the Docker images on (almost) equal footing with Flink binary
> >    release artifacts will help the legitimacy of and user confidence in
> >    running Flink in containerized environments
> >    -
> >
> >    By publishing release candidate (and possibly nightly) images, the
> >    release testing and automated testing processes could be improved
> >    -
> >
> >    The delay between Flink releases and when the corresponding Docker
> >    images are available will be reduced
> >
> >
> > Considering all of this, I propose the following:
> >
> >    -
> >
> >    We move the Git repository containing the Dockerfiles from the
> >    docker-flink GitHub organization to Apache, placing it under control
> of
> > the
> >    Flink PMC
> >    -
> >
> >    We codify updating these Dockerfiles and notifying Docker Hub into the
> >    Flink release process
> >    -
> >
> >       For release candidates, Dockerfiles should be added to a special
> >       directory which will be automatically built and pushed to the
> > Apache Docker
> >       Hub organization[7], e.g. apache/flink-rc:1.10.0-rc1
> >       -
> >
> >       Upon release, the appropriate “release” Dockerfiles are added (e.g.
> >       under the 1.10 directory) and release candidate Dockerfiles
> removed,
> > and
> >       then a pull request opened on the docker-library/official-images
> > repository
> >       -
> >
> >    Optionally, we introduce “nightly” builds, with an automated process
> >    building and pushing images to the Apache Docker Hub organization,
> e.g.
> >    apache/flink-dev:1.10-SNAPSHOT
> >
> >
> > If we choose to move forward in this direction, there are some further
> > steps we could take to improve the experience of both developing and
> using
> > Flink with Docker (these are actually mostly orthogonal to the proposed
> > changes above, but I think this is a natural first step and should make
> the
> > following ideas easier to implement).
> >
> > First, there are important differences between images meant for running
> > Flink and those meant for development: the former should strictly package
> > only released distributions of software and be as thin of a layer as
> > possible over the software itself, while the latter can be used during
> > development and testing, and can easily be rebuilt from a “working copy”
> of
> > the software’s source code.
> >
> > By standardizing on defining such “production” images in the docker-flink
> > repository and “development” image(s) in the Flink repository itself, it
> is
> > much clearer to developers and users what the right Dockerfile or image
> > they should use for a given purpose. To that end, we could introduce one
> or
> > more documented Maven goals or Make targets for building a Docker image
> > from the current source tree or a specific release (including unreleased
> or
> > unsupported versions).
> >
> > Additionally, there has been discussion among Flink contributors for some
> > time about the confusing state of Dockerfiles within the Flink
> repository,
> > each meant for a different way of running Flink. I’m not completely up to
> > speed about these different efforts, but we could possibly solve this by
> > either building additional “official” images with different entrypoints
> for
> > these various purposes, or by developing an improved entrypoint script
> that
> > conveniently supports all cases. I defer to Till Rohrmann, Konstantin
> > Knauf, or Stephan Ewen for further discussion on this point.
> >
> > I apologize again for the wall of text, but if you made it this far,
> thank
> > you! These improvements have been a long time coming, and I hope we can
> > find a solution that serves the Flink and Docker communities well. Please
> > don’t hesitate to ask any questions.
> >
> > --
> >
> > Patrick Lucas
> >
> > [1] https://hub.docker.com/_/flink
> >
> > [2]
> >
> >
> https://lists.apache.org/thread.html/c50297f8659aaa59d4f2ae327b69c4d46d1ab8ecc53138e659e4fe91%40%3Cdev.flink.apache.org%3E
> >
> > [3] On page 2 at the time we went to press:
> > https://hub.docker.com/search?q=&type=image&image_filter=official
> >
> > [4] https://github.com/docker-flink/docker-flink
> >
> > [5]
> >
> >
> https://github.com/docker-library/official-images/pulls?q=is%3Apr+label%3Alibrary%2Fflink
> >
> > [6] I looked at the 25 most popular “official” images (see [3]) as well
> as
> > “official” images of Apache software from the top 125; all use a
> dedicated
> > repo
> > [7] https://hub.docker.com/u/apache
> >
>

Reply via email to