[ 
https://issues.apache.org/jira/browse/FLINK-29110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17585188#comment-17585188
 ] 

Peng Yuan commented on FLINK-29110:
-----------------------------------

When I use custom StorageClass create ReadWriteOnce for many TMs, it failed. 

The differences is:

With {*}Deployments{*}:

We can either create a single PVC for all TMs, and all TM share one single 
volume performance.

Or we can manually create n PVCs for n TMs  and map them accordingly (this 
imitates the behavior of volumeClaimTemplates and opens us to all sorts of 
problems because of the "manual" approach)

With {*}StatefulSets{*}:
when a pod is started it creates a claim based on the template and its name 
(achieving the needed results without any fancy management from the operator).
 

An example for the problem k8s faces with Deployment here:
Lets say you have 30 TMs and need 30 PVCs for them (lvm volume for each TM 
where the local rocksDB is written to),
How will k8s know to map a PVC to a TM? (if you don't use the template 
mechanism in StatefulSet, you will have to specify a name in the PVC and map it 
to each TM) and when a pod crashes and gets built again by k8s, how can we use 
the previous PVC for it? or delete the old one and map it to it? those things 
seems to be impossible for a Deployment and require us to use StatefulSet.

Mounting external volume for each TM is a essential, and there are two 
advantages that cannot be ignored:
 * disk space - Large states storage
 * disk performance - Each TM can use the disk independently

> Support to mount a dynamically-created pvc for JM and TM in standalone mode 
> with StatefulSet.
> ---------------------------------------------------------------------------------------------
>
>                 Key: FLINK-29110
>                 URL: https://issues.apache.org/jira/browse/FLINK-29110
>             Project: Flink
>          Issue Type: Improvement
>          Components: Kubernetes Operator
>            Reporter: Peng Yuan
>            Priority: Major
>
> Use StatefulSet instead of Deployment to deploy JM and TM to support mount a 
> dynamically-created PersistentVolumeClaim.
> add volumeClaimTemplates to JobManagerSpec and TaskManagerSpec:
> JobManagerSpec:
> {code:java}
> public class JobManagerSpec {
>     /** Resource specification for the JobManager pods. */
>     private Resource resource;
>     /** Number of JobManager replicas. Must be 1 for non-HA deployments. */
>     private int replicas = 1;
>     /** Volume Claim Templates for JobManager stateful set. Just for 
> standalone mode. */
>     private List<PersistentVolumeClaim> volumeClaimTemplates = new 
> ArrayList<>();
>     /** JobManager pod template. It will be merged with 
> FlinkDeploymentSpec.podTemplate. */
>     private Pod podTemplate;
> }
>  {code}
> TaskManagerSpec:
> {code:java}
> public class TaskManagerSpec {
>     /** Resource specification for the TaskManager pods. */
>     private Resource resource;
>     /** Number of TaskManager replicas. If defined, takes precedence over 
> parallelism */
>     @SpecReplicas private Integer replicas;
>     /** Volume Claim Templates for TaskManager stateful set. Just for 
> standalone mode. */
>     private List<PersistentVolumeClaim> volumeClaimTemplates = new 
> ArrayList<>();
>     /** TaskManager pod template. It will be merged with 
> FlinkDeploymentSpec.podTemplate. */
>     private Pod podTemplate;
> } {code}
>  
> volumeClaimTemplates just available in standalone mode.
> CR Example:
> {code:java}
> kind: FlinkDeployment
> metadata:
>   namespace: default
>   name: basic-example
> spec:
>   image: flink:1.14.3
>   flinkVersion: v1_14
>   flinkConfiguration:
>     taskmanager.numberOfTaskSlots: "2"
>   serviceAccount: flink
>   jobManager:
>     replicas: 1
>     resource:
>       memory: "2048m"
>       cpu: 1
>     volumeClaimTemplates:
>       - metadata:
>           name: log
>         spec:
>           accessModes: [ "ReadWriteOnce" ]
>           storageClassName: "alicloud-local-lvm"
>           resources:
>             requests:
>               storage: 10Gi
>     podTemplate:
>       apiVersion: v1
>       kind: Pod
>       metadata:
>         name: job-manager-pod-template
>       spec:
>         containers:
>           - name: flink-main-container
>             volumeMounts:
>               - name: log-volume
>                 mountPath: /opt/flink/log
>   taskManager:
>     replicas: 1 // (only needed for standalone clusters)*     
>     resource:
>       memory: "2048m"
>       cpu: 1
>     volumeClaimTemplates: 
>       - metadata:
>           name: log
>         spec:
>           accessModes: [ "ReadWriteOnce" ]
>           storageClassName: "alicloud-local-lvm"
>           resources:
>             requests:
>               storage: 10Gi
>     podTemplate:
>       apiVersion: v1
>       kind: Pod
>       metadata:
>         name: task-manager-pod-template
>       spec:
>         containers:
>           - name: flink-main-container
>             volumeMounts:
>               - name: log-volume
>                 mountPath: /opt/flink/log
>   mode: standalone {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to