[GitHub] [flink] tillrohrmann commented on a change in pull request #14254: [FLINK-20357][docs] Split HA documentation up into a general overview and the specific implementations

GitBox Mon, 30 Nov 2020 01:32:40 -0800


tillrohrmann commented on a change in pull request #14254:
URL: https://github.com/apache/flink/pull/14254#discussion_r532455932




##########
File path: docs/deployment/ha/kubernetes_ha.md
##########
@@ -23,77 +23,50 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-## Kubernetes Cluster High Availability
-Kubernetes high availability service could support both [standalone Flink on 
Kubernetes]({% link deployment/resource-providers/standalone/kubernetes.md %}) 
and [native Kubernetes integration]({% link 
deployment/resource-providers/native_kubernetes.md %}).
+Flink's Kubernetes HA services use [Kubernetes](https://kubernetes.io/) for 
high availability services.
 
-When running Flink JobManager as a Kubernetes deployment, the replica count 
should be configured to 1 or greater.
-* The value `1` means that a new JobManager will be launched to take over 
leadership if the current one terminates exceptionally.
-* The value `N` (greater than 1) means that multiple JobManagers will be 
launched simultaneously while one is active and others are standby. Starting 
more than one JobManager will make the recovery faster.
+* Toc
+{:toc}
 
-### Configuration
-{% highlight yaml %}
-kubernetes.cluster-id: <ClusterId>
-high-availability: 
org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
-high-availability.storageDir: hdfs:///flink/recovery
-{% endhighlight %}
+Kubernetes high availability services can only be used when deploying to 
Kubernetes.
+Consequently, they can be configured when using [standalone Flink on 
Kubernetes]({% link deployment/resource-providers/standalone/kubernetes.md %}) 
or the [native Kubernetes integration]({% link 
deployment/resource-providers/native_kubernetes.md %})
 
-#### Example: Highly Available Standalone Flink Cluster on Kubernetes
-Both session and job/application clusters support using the Kubernetes high 
availability service. Users just need to add the following Flink config options 
to [flink-configuration-configmap.yaml]({% link 
deployment/resource-providers/standalone/kubernetes.md 
%}#common-cluster-resource-definitions). All other yamls do not need to be 
updated.
-
-<span class="label label-info">Note</span> The filesystem which corresponds to 
the scheme of your configured HA storage directory must be available to the 
runtime. Refer to [custom Flink image]({% link 
deployment/resource-providers/standalone/docker.md %}#customize-flink-image) 
and [enable plugins]({% link deployment/resource-providers/standalone/docker.md 
%}#using-plugins) for more information.
-
-{% highlight yaml %}
-apiVersion: v1
-kind: ConfigMap
-metadata:
-  name: flink-config
-  labels:
-    app: flink
-data:
-  flink-conf.yaml: |+
-  ...
-    kubernetes.cluster-id: <ClusterId>
-    high-availability: 
org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
-    high-availability.storageDir: hdfs:///flink/recovery
-    restart-strategy: fixed-delay
-    restart-strategy.fixed-delay.attempts: 10
-  ...
-{% endhighlight %}
+## Configuration
 
-#### Example: Highly Available Native Kubernetes Cluster
-Using the following command to start a native Flink application cluster on 
Kubernetes with high availability configured.
-{% highlight bash %}
-$ ./bin/flink run-application -p 8 -t kubernetes-application \
-  -Dkubernetes.cluster-id=<ClusterId> \
-  -Dtaskmanager.memory.process.size=4096m \
-  -Dkubernetes.taskmanager.cpu=2 \
-  -Dtaskmanager.numberOfTaskSlots=4 \
-  -Dkubernetes.container.image=<CustomImageName> \
-  
-Dhigh-availability=org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
 \
-  -Dhigh-availability.storageDir=s3://flink/flink-ha \
-  -Drestart-strategy=fixed-delay -Drestart-strategy.fixed-delay.attempts=10 \
-  
-Dcontainerized.master.env.ENABLE_BUILT_IN_PLUGINS=flink-s3-fs-hadoop-{{site.version}}.jar
 \
-  
-Dcontainerized.taskmanager.env.ENABLE_BUILT_IN_PLUGINS=flink-s3-fs-hadoop-{{site.version}}.jar
 \
-  local:///opt/flink/examples/streaming/StateMachineExample.jar
-{% endhighlight %}
+In order to start an HA-cluster you have to configure the following 
configuration keys:
 
-### High Availability Data Clean Up
-Currently, when a Flink job reached the terminal state (`FAILED`, `CANCELED`, 
`FINISHED`), all the HA data, including metadata in Kubernetes ConfigMap and HA 
state on DFS, will be cleaned up.
+- **high-availability mode** (required): 
+The [`high-availability`]({% link deployment/config.md %}#high-availability-1) 
option has to be set to `KubernetesHaServicesFactory`.
 
-So the following command will only shut down the Flink session cluster and 
leave all the HA related ConfigMaps, state untouched.
-{% highlight bash %}
-$ echo 'stop' | ./bin/kubernetes-session.sh 
-Dkubernetes.cluster-id=<ClusterId> -Dexecution.attached=true
-{% endhighlight %}
+  <pre>high-availability: 
org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory</pre>
+
+- **Storage directory** (required): 
+JobManager metadata is persisted in the file system 
[`high-availability.storageDir`]({% link deployment/config.md 
%}#high-availability-storagedir) and only a pointer to this state is stored in 
ZooKeeper.

Review comment:
       Good catch :-)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] tillrohrmann commented on a change in pull request #14254: [FLINK-20357][docs] Split HA documentation up into a general overview and the specific implementations

Reply via email to