Re: [DISCUSS] Flink Docker Playgrounds

Seth Wiesman Thu, 08 Aug 2019 09:25:32 -0700

Hey Fabian, 

I support option 1.


 As per FLIP-42, playgrounds are going to become core to flinks getting started 
experience and I believe it is worth the effort to get this right.

- As you mentioned, we may (and in my opinion definitely will) add more images 
in the future. Setting up an integration now will set the stage for those 
future additions.

- These images will be many users first exposure to Flink and having a proper 
release cycle to ensure they work properly may be worth the effort in and of 
itself. We already found during the first PR to that repo that we needed to 
find users with different OSs to test.

- Similarly to the above point, having the images hosted under an official 
Apache account adds a certain amount of credibility and shows the community 
that we take on-boarding new users seriously.

- I am generally opposed having the official flink docs rely on something that 
is hosted under someone’s personal account. I don’t want bug fixes or updates 
to be blocked by your (or some else’s ) availability.

Seth 

> On Aug 8, 2019, at 10:36 AM, Fabian Hueske <fhue...@gmail.com> wrote:
> 
> Hi everyone,
> 
> As you might know, some of us are currently working on Docker-based
> playgrounds that make it very easy for first-time Flink users to try out
> and play with Flink [0].
> 
> Our current setup (still work in progress with some parts merged to the
> master branch) looks as follows:
> * The playground is a Docker Compose environment [1] consisting of Flink,
> Kafka, and Zookeeper images (ZK for Kafka). The playground is based on a
> specific Flink job.
> * We had planned to add the example job of the playground as an example to
> the flink main repository to bundle it with the Flink distribution. Hence,
> it would have been included in the Docker-hub-official (soon to be
> published) Flink 1.9 Docker image [2].
> * The main motivation of adding the job to the examples module in the flink
> main repo was to avoid the maintenance overhead for a customized Docker
> image.
> 
> When discussing to backport the playground job (and its data generator) to
> include it in the Flink 1.9 examples, concerns were raised about their
> Kafka dependency which will become a problem, if the community agrees on
> the recently proposed repository split, which would remove flink-kafka from
> the main repository [3]. I think this is a fair concern, that we did not
> consider when designing the playground (also the repo split was not
> proposed yet).
> 
> If we don't add the playground job to the examples, we need to put it
> somewhere else. The obvious choice would be the flink-playgrounds [4]
> repository, which was intended for the docker-compose configuration files.
> However, we would not be able to include it in the Docker-hub-official
> Flink image any more and would need to maintain a custom Docker image, what
> we tried to avoid. The custom image would of course be based on the
> Docker-hub-official Flink image.
> 
> There are different approaches for this:
> 
> 1) Building one (or more) official ASF images
> There is an official Apache Docker Hub user [5] and a bunch of projects
> publish Docker images via this user. Apache Infra seems to support an
> process that automatically builds and publishes Docker images when a
> release tag is added to a repository. This feature needs to be enabled. I
> haven't found detailed documentation on this but there is a bunch of INFRA
> Jira tickets that discuss this mechanism.
> This approach would mean that we need a formal Apache release for
> flink-playgrounds (similar to flink-shaded). The obvious benefits are that
> these images would be ASF-official Docker images. In case we can publish
> more than one image per repo, we could also publish images for other
> playgrounds (like the SQL playground, which could be based on the SQL
> training that I built [6] which uses an image that is published under my
> user [7]).
> 
> 2) Rely on an external image
> This image could be build by somebody in the community (like me). Problem
> is of course, that the image is not an official image and we would rely on
> a volunteer to build the images.
> OTOH, the overhead would be pretty small. No need to roll run full
> releases, integration with Infra's build process, etc.
> 
> IMO, the first approach is clearly the better choice but also needs a bunch
> of things to be put into place.
> 
> What do others think?
> Does somebody have another idea?
> 
> Cheers,
> Fabian
> 
> [0]
> https://ci.apache.org/projects/flink/flink-docs-master/getting-started/docker-playgrounds/flink_cluster_playground.html
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/getting-started/docker-playgrounds/flink_cluster_playground.html#anatomy-of-this-playground
> [2] https://hub.docker.com/_/flink
> [3]
> https://lists.apache.org/thread.html/eb841f610ef2c191b8d00b6c07b2eab513da2e4eb2d7da5c5e6846f4@%3Cdev.flink.apache.org%3E
> [4] https://github.com/apache/flink-playgrounds
> [5] https://hub.docker.com/u/apache
> [6] https://github.com/ververica/sql-training/
> [7] https://hub.docker.com/r/fhueske/flink-sql-client-training-1.7.2

Re: [DISCUSS] Flink Docker Playgrounds

Reply via email to