Thanks for the reply, Andrey. Regarding building from local dist: - Yes, I bring this up mostly for development purpose. Since k8s is popular, I believe more and more developers would like to test their work on k8s cluster. I'm not sure should all developers write a custom docker file themselves in this scenario. Thus, I still prefer to provide a script for devs. - I agree to keep the scope of this FLIP mostly for those normal users. But as far as I can see, supporting building from local dist would not take much extra effort. - The maven docker plugin sounds good. I'll take a look at it.
Regarding supporting JAVA 11: - Not sure if it is necessary to ship JAVA. Maybe we could just change the base image from openjdk:8-jre to openjdk:11-jre in template docker file[1]. Correct me if I understand incorrectly. Also, I agree to move this out of the scope of this FLIP if it indeed takes much extra effort. Regarding the custom configuration, the mechanism that Thomas mentioned LGTM. [1] https://github.com/apache/flink-docker/blob/master/Dockerfile-debian.template Best, Yangze Guo On Wed, Mar 11, 2020 at 5:52 AM Thomas Weise <t...@apache.org> wrote: > > Thanks for working on improvements to the Flink Docker container images. This > will be important as more and more users are looking to adopt Kubernetes and > other deployment tooling that relies on Docker images. > > A generic, dynamic configuration mechanism based on environment variables is > essential and it is already supported via envsubst and an environment > variable that can supply a configuration fragment: > > https://github.com/apache/flink-docker/blob/09adf2dcd99abfb6180e1e2b5b917b288e0c01f6/docker-entrypoint.sh#L88 > https://github.com/apache/flink-docker/blob/09adf2dcd99abfb6180e1e2b5b917b288e0c01f6/docker-entrypoint.sh#L85 > > This gives the necessary control for infrastructure use cases that aim to > supply deployment tooling other users. An example in this category this is > the FlinkK8sOperator: > > https://github.com/lyft/flinkk8soperator/tree/master/examples/wordcount > > On the flip side, attempting to support a fixed subset of configuration > options is brittle and will probably lead to compatibility issues down the > road: > > https://github.com/apache/flink-docker/blob/09adf2dcd99abfb6180e1e2b5b917b288e0c01f6/docker-entrypoint.sh#L97 > > Besides the configuration, it may be worthwhile to see in which other ways > the base Docker images can provide more flexibility to incentivize wider > adoption. > > I would second that it is desirable to support Java 11 and in general use a > base image that allows the (straightforward) use of more recent versions of > other software (Python etc.) > > https://github.com/apache/flink-docker/blob/d3416e720377e9b4c07a2d0f4591965264ac74c5/Dockerfile-debian.template#L19 > > Thanks, > Thomas > > On Tue, Mar 10, 2020 at 12:26 PM Andrey Zagrebin <azagre...@apache.org> wrote: >> >> Hi All, >> >> Thanks a lot for the feedback! >> >> *@Yangze Guo* >> >> - Regarding the flink_docker_utils#install_flink function, I think it >> > should also support build from local dist and build from a >> > user-defined archive. >> >> I suppose you bring this up mostly for development purpose or powerful >> users. >> Most of normal users are usually interested in mainstream released versions >> of Flink. >> Although, you are bring a valid concern, my idea was to keep scope of this >> FLIP mostly for those normal users. >> The powerful users are usually capable to design a completely >> custom Dockerfile themselves. >> At the moment, we already have custom Dockerfiles e.g. for tests in >> flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile. >> We can add something similar for development purposes and maybe introduce a >> special maven goal. There is a maven docker plugin, afaik. >> I will add this to FLIP as next step. >> >> - It seems that the install_shaded_hadoop could be an option of >> > install_flink >> >> I woud rather think about this as a separate independent optional step. >> >> - Should we support JAVA 11? Currently, most of the docker file based on >> > JAVA 8. >> >> Indeed, it is a valid concern. Java version is a fundamental property of >> the docker image. >> To customise this in the current mainstream image is difficult, this would >> require to ship it w/o Java at all. >> Or this is a separate discussion whether we want to distribute docker hub >> images with different Java versions or just bump it to Java 11. >> This should be easy in a custom Dockerfile for development purposes though >> as mentioned before. >> >> - I do not understand how to set config options through >> >> "flink_docker_utils configure"? Does this step happen during the image >> > build or the container start? If it happens during the image build, >> > there would be a new image every time we change the config. If it just >> > a part of the container entrypoint, I think there is no need to add a >> > configure command, we could just add all dynamic config options to the >> > args list of "start_jobmaster"/"start_session_jobmanager". Am I >> > understanding this correctly? >> >> `flink_docker_utils configure ...` can be called everywhere: >> - while building a custom image (`RUN flink_docker_utils configure ..`) by >> extending our base image from docker hub (`from flink`) >> - in a custom entry point as well >> I will check this but if user can also pass a dynamic config option it also >> sounds like a good option >> Our standard entry point script in base image could just properly forward >> the arguments to the Flink process. >> >> @Yang Wang >> >> > About docker utils >> > I really like the idea to provide some utils for the docker file and entry >> > point. The >> > `flink_docker_utils` will help to build the image easier. I am not sure >> > about the >> > `flink_docker_utils start_jobmaster`. Do you mean when we build a docker >> > image, we >> > need to add `RUN flink_docker_utils start_jobmaster` in the docker file? >> > Why do we need this? >> >> This is a scripted action to start JM. It can be called everywhere. >> Indeed, it does not make too much sense to run it in Dockerfile. >> Mostly, the idea was to use in a custom entry point. When our base docker >> hub image is started its entry point can be also completely overridden. >> The actions are also sorted in the FLIP: for Dockerfile or for entry point. >> E.g. our standard entry point script in the base docker hub image can >> already use it. >> Anyways, it was just an example, the details are to be defined in Jira, imo. >> >> > About docker entry point >> > I agree with you that the docker entry point could more powerful with more >> > functionality. >> > Mostly, it is about to override the config options. If we support dynamic >> > properties, i think >> > it is more convenient for users without any learning curve. >> > `docker run flink session_jobmanager -D rest.bind-port=8081` >> >> Indeed, as mentioned before, it can be a better option. >> The standard entry point also decides at least what to run JM or TM. I >> think we will see what else makes sense to include there during the >> implementation. >> Some specifics may be more convenient to set with env vars as Konstantin >> mentioned. >> >> > About the logging >> > Updating the `log4j-console.properties` to support multiple appender is a >> > better option. >> > Currently, the native K8s is suggesting users to debug the logs in this >> > way[1]. However, >> > there is also some problems. The stderr and stdout of JM/TM processes could >> > not be >> > forwarded to the docker container console. >> >> Strange, we should check maybe there is a docker option to query the >> container's stderr output as well. >> If we forward Flink process stdout as usual in bash console, it should not >> be a problem. Why can it not be forwarded? >> >> @Konstantin Knauf >> >> For the entrypoint, have you considered to also allow setting configuration >> > via environment variables as in "docker run -e FLINK_REST_BIN_PORT=8081 >> > ..."? This is quite common and more flexible, e.g. it makes it very easy to >> > pass values of Kubernetes Secrets into the Flink configuration. >> >> This is indeed an interesting option to pass arguments to the entry point >> in general. >> For the config options, the dynamic args can be a better option as >> mentioned above. >> >> With respect to logging, I would opt to keep this very basic and to only >> > support logging to the console (maybe with a fix for the web user >> > interface). For everything else, users can easily build their own images >> > based on library/flink (provide the dependencies, change the logging >> > configuration). >> >> agree >> >> Thanks, >> Andrey >> >> On Sun, Mar 8, 2020 at 8:55 PM Konstantin Knauf <konstan...@ververica.com> >> wrote: >> >> > Hi Andrey, >> > >> > thanks a lot for this proposal. The variety of Docker files in the project >> > has been causing quite some confusion. >> > >> > For the entrypoint, have you considered to also allow setting >> > configuration via environment variables as in "docker run -e >> > FLINK_REST_BIN_PORT=8081 ..."? This is quite common and more flexible, e.g. >> > it makes it very easy to pass values of Kubernetes Secrets into the Flink >> > configuration. >> > >> > With respect to logging, I would opt to keep this very basic and to only >> > support logging to the console (maybe with a fix for the web user >> > interface). For everything else, users can easily build their own images >> > based on library/flink (provide the dependencies, change the logging >> > configuration). >> > >> > Cheers, >> > >> > Konstantin >> > >> > >> > On Thu, Mar 5, 2020 at 11:01 AM Yang Wang <danrtsey...@gmail.com> wrote: >> > >> >> Hi Andrey, >> >> >> >> >> >> Thanks for driving this significant FLIP. From the user ML, we could also >> >> know there are >> >> many users running Flink in container environment. Then the docker image >> >> will be the >> >> very basic requirement. Just as you say, we should provide a unified >> >> place for all various >> >> usage(e.g. session, job, native k8s, swarm, etc.). >> >> >> >> >> >> > About docker utils >> >> >> >> I really like the idea to provide some utils for the docker file and >> >> entry point. The >> >> `flink_docker_utils` will help to build the image easier. I am not sure >> >> about the >> >> `flink_docker_utils start_jobmaster`. Do you mean when we build a docker >> >> image, we >> >> need to add `RUN flink_docker_utils start_jobmaster` in the docker file? >> >> Why do we need this? >> >> >> >> >> >> > About docker entry point >> >> >> >> I agree with you that the docker entry point could more powerful with >> >> more functionality. >> >> Mostly, it is about to override the config options. If we support dynamic >> >> properties, i think >> >> it is more convenient for users without any learning curve. >> >> `docker run flink session_jobmanager -D rest.bind-port=8081` >> >> >> >> >> >> > About the logging >> >> >> >> Updating the `log4j-console.properties` to support multiple appender is a >> >> better option. >> >> Currently, the native K8s is suggesting users to debug the logs in this >> >> way[1]. However, >> >> there is also some problems. The stderr and stdout of JM/TM processes >> >> could not be >> >> forwarded to the docker container console. >> >> >> >> >> >> [1]. >> >> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html#log-files >> >> >> >> >> >> Best, >> >> Yang >> >> >> >> >> >> >> >> >> >> Andrey Zagrebin <azagre...@apache.org> 于2020年3月4日周三 下午5:34写道: >> >> >> >>> Hi All, >> >>> >> >>> If you have ever touched the docker topic in Flink, you >> >>> probably noticed that we have multiple places in docs and repos which >> >>> address its various concerns. >> >>> >> >>> We have prepared a FLIP [1] to simplify the perception of docker topic in >> >>> Flink by users. It mostly advocates for an approach of extending official >> >>> Flink image from the docker hub. For convenience, it can come with a set >> >>> of >> >>> bash utilities and documented examples of their usage. The utilities >> >>> allow >> >>> to: >> >>> >> >>> - run the docker image in various modes (single job, session master, >> >>> task manager etc) >> >>> - customise the extending Dockerfile >> >>> - and its entry point >> >>> >> >>> Eventually, the FLIP suggests to remove all other user facing Dockerfiles >> >>> and building scripts from Flink repo, move all docker docs to >> >>> apache/flink-docker and adjust existing docker use cases to refer to this >> >>> new approach (mostly Kubernetes now). >> >>> >> >>> The first contributed version of Flink docker integration also contained >> >>> example and docs for the integration with Bluemix in IBM cloud. We also >> >>> suggest to maintain it outside of Flink repository (cc Markus Müller). >> >>> >> >>> Thanks, >> >>> Andrey >> >>> >> >>> [1] >> >>> >> >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-111%3A+Docker+image+unification >> >>> >> >> >> > >> > -- >> > >> > Konstantin Knauf | Head of Product >> > >> > +49 160 91394525 >> > >> > >> > Follow us @VervericaData Ververica <https://www.ververica.com/> >> > >> > >> > -- >> > >> > Join Flink Forward <https://flink-forward.org/> - The Apache Flink >> > Conference >> > >> > Stream Processing | Event Driven | Real Time >> > >> > -- >> > >> > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany >> > >> > -- >> > Ververica GmbH >> > Registered at Amtsgericht Charlottenburg: HRB 158244 B >> > Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji >> > (Tony) Cheng >> >