1u0 commented on a change in pull request #8741: [FLINK-12752] Add Option to 
Pass Seed for JobID Hash for StandaloneJobClusterEntrypoint
URL: https://github.com/apache/flink/pull/8741#discussion_r298976819
 
 

 ##########
 File path: flink-container/kubernetes/README.md
 ##########
 @@ -1,75 +1,124 @@
-# Apache Flink job cluster deployment on Kubernetes
+# Apache Flink Job Cluster Deployment on Kubernetes
 
-## Build container image using Docker
+## Job Cluster Container Images
 
-In order to deploy a job cluster on Kubernetes, you first need to build a 
Docker image containing Flink and the user code jar.
-Please follow the instructions you can find [here](../docker/README.md) to 
build a job container image.
+In order to deploy a Job Cluster on Kubernetes, you first need to build a 
Docker image containing
+Flink and the user code jar. Please follow the instructions you can find 
[here](../docker/README.md)
+to build a job container image.
 
-## Deploy Flink job cluster
+## Flink Job Cluster Deployment on Kubernetes
 
-This directory contains a predefined K8s service and two template files for 
the job cluster entry point and the task managers.
+This directory contains three Kubernetes resource definitions:
 
-The K8s service is used to let the cluster pods find each other.
-If you start the Flink cluster in HA mode, then this is not necessary, because 
the HA implementation is used to detect leaders.
+* Kubernetes Job for the Job Cluster entrypoint
+* Kubernetes Deployment for the Task Managers
+* Kubernetes Service backed by the Job Cluster Pod
 
-In order to use the template files, please replace the `${VARIABLES}` in the 
file with concrete values.
-The files contain the following variables:
+**Kubernetes Job for the Job Cluster Entrypoint**
 
+The Kubernetes Job for the Job Cluster entrypoint will start & restart the Job 
Cluster until the Job
+has reached a terminal state (e.g. CANCELLED).
+
+In case of a high-availability setup (needs to be configured in the Flink 
configuration) the
+previous job will be recovered from the latest checkpoint. For recovery the 
JobID of the Flink Job
+needs to be stable over the lifetime of the the Kubernetes Job. For this, the 
JobID is seeded with
+the name of the application (`${FLINK_APPLICATION_NAME}`).
+
+Without high-availability configuration, the JobID needs to change upon 
restarts of the Job
+Cluster Pod. For this, the JobID should not be seeded and the respective 
argument ought to be
+removed from the given Pod specification of the Job.
+
+**Kubernetes Deployment for the Task Managers**
+
+The Task Manager Deployment is a simple Kubernetes Deployment. The number of 
replicas needs to be
+configured by replacing `${FLINK_JOB_PARALLELISM}` (by default each Task 
Manager provides one Task
+Slot). The Task Managers are pointed to the Kubernetes Services (see below), 
which is not necessary
+in case of a high-availability setup. In this case the Task Managers query 
Zookeeper for the address
+of the current Job Manager.
+
+**Kubernetes Service backed by the Job Cluster Pod**
+
+This service is used to let the Task Manager Pods find the Job Cluster Pod. If 
you start the Flink
+cluster in HA mode, then this is not necessary, because the HA implementation 
is used to detect
+leaders.
 
 Review comment:
   I'd change this paragraph:
    * omit mentioning HA mode at all (I still see value of service resource 
even in HA mode);
    * the service resource is useful not only for TM -> JM communication. It's 
also needed to access web UI. In fact, there is a paragraph down below, that 
mentions `<NODE_IP>:30081` address to access it and that defined in the service 
resource.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to