[GitHub] [flink] azagrebin commented on a change in pull request #12131: [FLINK-17161][docs][docker] Document the official docker hub image and examples of how to use it

GitBox Thu, 14 May 2020 03:10:03 -0700


azagrebin commented on a change in pull request #12131:
URL: https://github.com/apache/flink/pull/12131#discussion_r425023459




##########
File path: docs/ops/deployment/docker.md
##########
@@ -24,119 +24,476 @@ under the License.
 -->
 
 [Docker](https://www.docker.com) is a popular container runtime. 
-There are Docker images for Apache Flink available on Docker Hub which can be 
used to deploy a session cluster.
-The Flink repository also contains tooling to create container images to 
deploy a job cluster.
+There are Docker images for Apache Flink available [on Docker 
Hub](https://hub.docker.com/_/flink).
+You can use the docker images to deploy a Session or Job cluster in a 
containerised environment, e.g.
+[standalone Kubernetes](kubernetes.html) or [native 
Kubernetes](native_kubernetes.html).
 
 * This will be replaced by the TOC
 {:toc}
 
-## Flink session cluster
-
-A Flink session cluster can be used to run multiple jobs. 
-Each job needs to be submitted to the cluster after it has been deployed. 
-
-### Docker images
+## Docker Hub Flink images
 
 The [Flink Docker repository](https://hub.docker.com/_/flink/) is hosted on
 Docker Hub and serves images of Flink version 1.2.1 and later.
 
-Images for each supported combination of Hadoop and Scala are available, and 
tag aliases are provided for convenience.
+### Image tags
 
-Beginning with Flink 1.5, image tags that omit a Hadoop version (e.g.
-`-hadoop28`) correspond to Hadoop-free releases of Flink that do not include a
-bundled Hadoop distribution.
+Images for each supported combination of Flink and Scala versions are 
available, and
+[tag aliases](https://hub.docker.com/_/flink?tab=tags) are provided for 
convenience.
 
-For example, the following aliases can be used: *(`1.5.y` indicates the latest
-release of Flink 1.5)*
+For example, you can use the following aliases: *(`1.11.y` indicates the 
latest release of Flink 1.11)*
 
 * `flink:latest` → `flink:<latest-flink>-scala_<latest-scala>`
-* `flink:1.5` → `flink:1.5.y-scala_2.11`
-* `flink:1.5-hadoop27` → `flink:1.5.y-hadoop27-scala_2.11`
+* `flink:1.11` → `flink:1.11.y-scala_2.11`
+
+<span class="label label-info">Note</span> Prio to Flink 1.5 version, Hadoop 
dependencies were always bundled with Flink.
+You can see that certain tags include the version of Hadoop, e.g. (e.g. 
`-hadoop28`).
+Beginning with Flink 1.5, image tags that omit the Hadoop version correspond 
to Hadoop-free releases of Flink
+that do not include a bundled Hadoop distribution.
+
+## How to run Flink image
+
+The Flink image contains a regular Flink distribution with its default 
configuration and a standard entry point script.
+You can run its standard entry point in the following modes:
+* Flink Master for [a Session cluster](#start-a-session-cluster)
+* Flink Master for [a Single Job cluster](#start-a-single-job-cluster)
+* TaskManager for any cluster
+
+This allows to deploy a standalone cluster (Session or Single Job) in any 
containerised environment, for example:
+* manually in a local docker setup,
+* [in a Kubernetes cluster](kubernetes.html),
+* [with Docker Compose](#flink-with-docker-compose),
+* [with Docker swarm](#flink-with-docker-swarm).
+
+<span class="label label-info">Note</span> [The native 
Kubernetes](native_kubernetes.html) also runs the same image by default
+and deploys TaskManagers on demand so that you do not have to do it.
+
+The next chapters describe how to start a single Flink docker container for 
various purposes.
+
+### Start a Session Cluster
+
+A Flink Session cluster can be used to run multiple jobs. Each job needs to be 
submitted to the cluster after it has been deployed.
+To deploy a Flink Session cluster with docker, you need to start a Flink 
Master container:
+
+```sh
+docker run flink:{% if site.is_stable 
%}{{site.version}}-scala{{site.scala_version_suffix}}{% else %}latest{% endif 
%} jobmanager
+```
+
+and required number of TaskManager containers:
+
+```sh
+docker run flink:{% if site.is_stable 
%}{{site.version}}-scala{{site.scala_version_suffix}}{% else %}latest{% endif 
%} taskmanager
+```
+
+### Start a Single Job Cluster
+
+A Flink Job cluster is a dedicated cluster which runs a single job.
+The job artifacts should be already available locally in the container and, 
thus, there is no extra job submission needed.
+To deploy a cluster for a single job with docker, you need to
+* make job artifacts available locally *in all containers* under 
`/opt/flink/usrlib`,
+* start a Flink Master container in the Single Job mode
+* and required number of TaskManager containers.
+
+To make the **job artifacts available** locally in the container, you can
+
+* **either mount a volume** (or multiple volumes) with the artifacts to 
`/opt/flink/usrlib` when you start
+the Flink image as a Single Job Master and the required number of TaskManagers:
+
+    ```sh
+    docker run \
+        --mount 
type=bind,src=/host/path/to/job/artifacts1,target=/opt/flink/usrlib/artifacts/1 
\
+        --mount 
type=bind,src=/host/path/to/job/artifacts2,target=/opt/flink/usrlib/artifacts/2 
\
+        flink:{% if site.is_stable 
%}{{site.version}}-scala{{site.scala_version_suffix}}{% else %}latest{% endif 
%} standalone-job \
+        --job-classname com.job.ClassName \
+        --job-id <job id> \
+        [--fromSavepoint /path/to/savepoint [--allowNonRestoredState]] \
+        [job arguments]
+
+    docker run \
+        --mount 
type=bind,src=/host/path/to/job/artifacts1,target=/opt/flink/usrlib/artifacts/1 
\
+        --mount 
type=bind,src=/host/path/to/job/artifacts2,target=/opt/flink/usrlib/artifacts/2 
\
+        flink:{% if site.is_stable 
%}{{site.version}}-scala{{site.scala_version_suffix}}{% else %}latest{% endif 
%} taskmanager
+    ```
+
+* **or extend the Flink image** by writing a custom `Dockerfile`, build it and 
start the custom image as a Single Job Master
+and the required number of TaskManagers:
+
+    ```dockerfile
+    FROM flink
+    ADD /host/path/to/job/artifacts/1 /opt/flink/usrlib/artifacts/1
+    ADD /host/path/to/job/artifacts/2 /opt/flink/usrlib/artifacts/2
+    ```
+
+    ```sh
+    docker build -t flink_with_job_artifacts .
+    docker run \
+        flink_with_job_artifacts standalone-job \
+        --job-classname com.job.ClassName \
+        --job-id <job id> \
+        [--fromSavepoint /path/to/savepoint [--allowNonRestoredState]] \
+        [job arguments]
+
+    docker run flink_with_job_artifacts taskmanager
+    ```
+
+The `standalone-job` argument starts a Flink Master container in the Single 
Job mode.
+
+#### Flink Master additional command line arguments
+
+You can provide the following additional command line arguments to the cluster 
entrypoint:
+
+* `--job-classname <job class name>`: Class name of the job to run.
+
+  By default, Flink scans its class path for a JAR with a Main-Class or 
program-class manifest entry and choses it as the job class.
+  Use this command line argument to manually set the job class.
+  This argument is required in case that no or more than one JAR with such a 
manifest entry is available on the class path.
+
+* `--job-id <job id>` (optional): Manually set a Flink job ID for the job 
(default: 00000000000000000000000000000000)
+
+* `--fromSavepoint /path/to/savepoint` (optional): Restore from a savepoint
 
-**Note:** The Docker images are provided as a community project by individuals
-on a best-effort basis. They are not official releases by the Apache Flink PMC.
+  In order to resume from a savepoint, you also need to pass the savepoint 
path.
+  Note that `/path/to/savepoint` needs to be accessible in the docker 
container locally

Review comment:
       I think everywhere, we can say `in all containers` but we can remove 
`locally` to avoid confusion as it can be remote path.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] azagrebin commented on a change in pull request #12131: [FLINK-17161][docs][docker] Document the official docker hub image and examples of how to use it

Reply via email to