litiliu commented on issue #10904:
URL: 
https://github.com/apache/dolphinscheduler/issues/10904#issuecomment-1306582252

   hi @caishunfeng @ikiler , this is the design of our implementation based on 
flink-kubernetes-operator.
   # Background
   We are using [Flink Kubernetes 
Operator](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/)
 to run Flink batch jar job on k8s cluster. 
   And all files in resource are stored on S3.
   
   # Summary design
   - In flink-task plugin's frontend, namespace & cluster & image & compute 
resources need to be specified by user to run this job. 
   - In the ds-worker, the CRD represents this batch job was built and 
submitted to k8s cluster by KubernetesClient.  
   - After that CRD has been submitted, a watcher is registered in ds-worker to 
monitor the status of that batch job(error or finish or timeout).
   After getting the final status of that job, that watcher will be closed.
   - The resources(jar or config files) are downloaded from S3 by the 
initContainer. Before the mainContainer run.
   
   # Detail design
   - In flink-task plugin's frontend, user need to specify the namespace & 
cluster & image in which to run that job.
   - In the ds-master, the resources(jar or config files)'s absolute path on S3 
will be calculated and set to context in the 
BaseTaskProcessor.getTaskExecutionContext()
   - In the ds-worker, the CRD of 
[FlinkDeployment](https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator-api/src/main/java/org/apache/flink/kubernetes/operator/api/FlinkDeployment.java)
     including the initContainer was built and submit to k8s cluster by 
KubernetesClient.
   - After that CRD has been submitted, a watcher is registered in ds-worker to 
monitor the status of that batch job(error or finish). A CountDownLatch is used 
to monitor job timeout.
   
   # limitation
   - Because we are using the "error" filed to monitor job status,only flink 
version >= 1.15 are supported, because of a flink bug, see [status no clear 
when deploying batch job with flink-k8s-operator
   ](https://lists.apache.org/thread/f9wr2l55p48w0b2q2mmtpojqjo2vzstm)
   
   
   Tks for discussion.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to