[ 
https://issues.apache.org/jira/browse/FLINK-29110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17596943#comment-17596943
 ] 

Yang Wang commented on FLINK-29110:
-----------------------------------

If the kubelet starts with a big local disk for rootDir, then using the 
emptyDir for the logs and rocksdb local state will be the best choice. However, 
I agree that using the StatefulSet will have more benefits. For example, 
accelerate the recovery by using working directory[1], mount a dedicated PV for 
each TM to get a better performance.

 

There's a related ticket FLINK-24332 for the native support. Since it is not a 
critical requirement, we have not put much efforts on it. Please be aware that 
it is not very easy to support working directory for native K8s mode because we 
always assume that the TaskManager pod will never be restarted and 
re-registered.

 

But I am not against with having a try on the standalone mode first.

 

[1]. 
https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/resource-providers/standalone/working_directory/

> Support to mount a dynamically-created pvc for JM and TM in standalone mode 
> with StatefulSet.
> ---------------------------------------------------------------------------------------------
>
>                 Key: FLINK-29110
>                 URL: https://issues.apache.org/jira/browse/FLINK-29110
>             Project: Flink
>          Issue Type: Improvement
>          Components: Kubernetes Operator
>            Reporter: Peng Yuan
>            Priority: Major
>
> Use StatefulSet instead of Deployment to deploy JM and TM to support mount a 
> dynamically-created PersistentVolumeClaim.
> add volumeClaimTemplates to JobManagerSpec and TaskManagerSpec:
> JobManagerSpec:
> {code:java}
> public class JobManagerSpec {
>     /** Resource specification for the JobManager pods. */
>     private Resource resource;
>     /** Number of JobManager replicas. Must be 1 for non-HA deployments. */
>     private int replicas = 1;
>     /** Volume Claim Templates for JobManager stateful set. Just for 
> standalone mode. */
>     private List<PersistentVolumeClaim> volumeClaimTemplates = new 
> ArrayList<>();
>     /** JobManager pod template. It will be merged with 
> FlinkDeploymentSpec.podTemplate. */
>     private Pod podTemplate;
> }
>  {code}
> TaskManagerSpec:
> {code:java}
> public class TaskManagerSpec {
>     /** Resource specification for the TaskManager pods. */
>     private Resource resource;
>     /** Number of TaskManager replicas. If defined, takes precedence over 
> parallelism */
>     @SpecReplicas private Integer replicas;
>     /** Volume Claim Templates for TaskManager stateful set. Just for 
> standalone mode. */
>     private List<PersistentVolumeClaim> volumeClaimTemplates = new 
> ArrayList<>();
>     /** TaskManager pod template. It will be merged with 
> FlinkDeploymentSpec.podTemplate. */
>     private Pod podTemplate;
> } {code}
>  
> volumeClaimTemplates just available in standalone mode.
> CR Example:
> {code:java}
> kind: FlinkDeployment
> metadata:
>   namespace: default
>   name: basic-example
> spec:
>   image: flink:1.14.3
>   flinkVersion: v1_14
>   flinkConfiguration:
>     taskmanager.numberOfTaskSlots: "2"
>   serviceAccount: flink
>   jobManager:
>     replicas: 1
>     resource:
>       memory: "2048m"
>       cpu: 1
>     volumeClaimTemplates:
>       - metadata:
>           name: log
>         spec:
>           accessModes: [ "ReadWriteOnce" ]
>           storageClassName: "alicloud-local-lvm"
>           resources:
>             requests:
>               storage: 10Gi
>     podTemplate:
>       apiVersion: v1
>       kind: Pod
>       metadata:
>         name: job-manager-pod-template
>       spec:
>         containers:
>           - name: flink-main-container
>             volumeMounts:
>               - name: log
>                 mountPath: /opt/flink/log
>   taskManager:
>     replicas: 1 // (only needed for standalone clusters)*     
>     resource:
>       memory: "2048m"
>       cpu: 1
>     volumeClaimTemplates: 
>       - metadata:
>           name: log
>         spec:
>           accessModes: [ "ReadWriteOnce" ]
>           storageClassName: "alicloud-local-lvm"
>           resources:
>             requests:
>               storage: 10Gi
>     podTemplate:
>       apiVersion: v1
>       kind: Pod
>       metadata:
>         name: task-manager-pod-template
>       spec:
>         containers:
>           - name: flink-main-container
>             volumeMounts:
>               - name: log
>                 mountPath: /opt/flink/log
>   mode: standalone {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to