[
https://issues.apache.org/jira/browse/FLINK-29110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17598261#comment-17598261
]
Gyula Fora commented on FLINK-29110:
------------------------------------
[@Grypse|https://github.com/Grypse] this seems like a fairly large architecture
decision that we underestimated at first. We would like to kindly ask you to
prepare your proposal in a FLIP format and post it on the DEV mailing list.
Please describe the following points in the FLIP:
* What is the benefit of using StatefulSet for TaskManagers?
** Does this only affect local recovery? What is the performance gain compared
to Deployments?
** What benefit do we have compared to the PVC available in the podTemplate?
Seems like ReadWriteMany PVCs already provide the same benefits for Deployments
** What is the implication of TaskManager startup time? Deployments start pods
faster as they dont need to wait for termination
** Should we make Deployment / StatefulSet configurable?
* What is the benefit of using StatefulSet for JobManagers?
** What data would the jobmanager store in PVC? What is the performance gain?
** Does this only affect state handles? That is very small data.
** Why can't we do this already with the podTemplate?
> Support to mount a dynamically-created pvc for JM and TM in standalone mode
> with StatefulSet.
> ---------------------------------------------------------------------------------------------
>
> Key: FLINK-29110
> URL: https://issues.apache.org/jira/browse/FLINK-29110
> Project: Flink
> Issue Type: Improvement
> Components: Kubernetes Operator
> Reporter: Peng Yuan
> Assignee: Peng Yuan
> Priority: Major
> Labels: pull-request-available
>
> Use StatefulSet instead of Deployment to deploy JM and TM to support mount a
> dynamically-created PersistentVolumeClaim.
> add volumeClaimTemplates to JobManagerSpec and TaskManagerSpec:
> JobManagerSpec:
> {code:java}
> public class JobManagerSpec {
> /** Resource specification for the JobManager pods. */
> private Resource resource;
> /** Number of JobManager replicas. Must be 1 for non-HA deployments. */
> private int replicas = 1;
> /** Volume Claim Templates for JobManager stateful set. Just for
> standalone mode. */
> private List<PersistentVolumeClaim> volumeClaimTemplates = new
> ArrayList<>();
> /** JobManager pod template. It will be merged with
> FlinkDeploymentSpec.podTemplate. */
> private Pod podTemplate;
> }
> {code}
> TaskManagerSpec:
> {code:java}
> public class TaskManagerSpec {
> /** Resource specification for the TaskManager pods. */
> private Resource resource;
> /** Number of TaskManager replicas. If defined, takes precedence over
> parallelism */
> @SpecReplicas private Integer replicas;
> /** Volume Claim Templates for TaskManager stateful set. Just for
> standalone mode. */
> private List<PersistentVolumeClaim> volumeClaimTemplates = new
> ArrayList<>();
> /** TaskManager pod template. It will be merged with
> FlinkDeploymentSpec.podTemplate. */
> private Pod podTemplate;
> } {code}
>
> volumeClaimTemplates just available in standalone mode[1].
> CR Example:
> {code:java}
> kind: FlinkDeployment
> metadata:
> namespace: default
> name: basic-example
> spec:
> image: flink:1.14.3
> flinkVersion: v1_14
> flinkConfiguration:
> taskmanager.numberOfTaskSlots: "2"
> serviceAccount: flink
> jobManager:
> replicas: 1
> resource:
> memory: "2048m"
> cpu: 1
> volumeClaimTemplates:
> - metadata:
> name: log
> spec:
> accessModes: [ "ReadWriteOnce" ]
> storageClassName: "alicloud-local-lvm"
> resources:
> requests:
> storage: 10Gi
> podTemplate:
> apiVersion: v1
> kind: Pod
> metadata:
> name: job-manager-pod-template
> spec:
> containers:
> - name: flink-main-container
> volumeMounts:
> - name: log
> mountPath: /opt/flink/log
> taskManager:
> replicas: 1 // (only needed for standalone clusters)*
> resource:
> memory: "2048m"
> cpu: 1
> volumeClaimTemplates:
> - metadata:
> name: log
> spec:
> accessModes: [ "ReadWriteOnce" ]
> storageClassName: "alicloud-local-lvm"
> resources:
> requests:
> storage: 10Gi
> podTemplate:
> apiVersion: v1
> kind: Pod
> metadata:
> name: task-manager-pod-template
> spec:
> containers:
> - name: flink-main-container
> volumeMounts:
> - name: log
> mountPath: /opt/flink/log
> mode: standalone {code}
> [1]. [FLIP-225: Implement standalone mode support in the kubernetes operator
> - Apache Flink - Apache Software
> Foundation|https://cwiki.apache.org/confluence/display/FLINK/FLIP-225%3A+Implement+standalone+mode+support+in+the+kubernetes+operator]
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)