litiliu commented on issue #10904: URL: https://github.com/apache/dolphinscheduler/issues/10904#issuecomment-1306582252
hi @caishunfeng @ikiler , this is the design of our implementation based on flink-kubernetes-operator. # Background We are using [Flink Kubernetes Operator](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/) to run Flink batch jar job on k8s cluster. And all files in resource are stored on S3. # Summary design - In flink-task plugin's frontend, namespace & cluster & image & compute resources need to be specified by user to run this job. - In the ds-worker, the CRD represents this batch job was built and submitted to k8s cluster by KubernetesClient. - After that CRD has been submitted, a watcher is registered in ds-worker to monitor the status of that batch job(error or finish or timeout). After getting the final status of that job, that watcher will be closed. - The resources(jar or config files) are downloaded from S3 by the initContainer. Before the mainContainer run. # Detail design - In flink-task plugin's frontend, user need to specify the namespace & cluster & image in which to run that job. - In the ds-master, the resources(jar or config files)'s absolute path on S3 will be calculated and set to context in the BaseTaskProcessor.getTaskExecutionContext() - In the ds-worker, the CRD of [FlinkDeployment](https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator-api/src/main/java/org/apache/flink/kubernetes/operator/api/FlinkDeployment.java) including the initContainer was built and submit to k8s cluster by KubernetesClient. - After that CRD has been submitted, a watcher is registered in ds-worker to monitor the status of that batch job(error or finish). A CountDownLatch is used to monitor job timeout. # limitation - Because we are using the "error" filed to monitor job status,only flink version >= 1.15 are supported, because of a flink bug, see [status no clear when deploying batch job with flink-k8s-operator ](https://lists.apache.org/thread/f9wr2l55p48w0b2q2mmtpojqjo2vzstm) Tks for discussion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
