Fei Feng created FLINK-34728:
--------------------------------
Summary: operator does not need to upload and download the jar
when deploying session job
Key: FLINK-34728
URL: https://issues.apache.org/jira/browse/FLINK-34728
Project: Flink
Issue Type: Improvement
Components: Deployment / Kubernetes, Kubernetes Operator
Affects Versions: kubernetes-operator-1.7.0, kubernetes-operator-1.6.0,
kubernetes-operator-1.5.0
Reporter: Fei Feng
By reading the source code of the sessionjob's first reconcilition in the
session mode of the flink kubernetes operator, a clear single point of
bottleneck can be identified. When submitting a session job, the operator needs
to first [download the job jar from the
jarURL|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/service/AbstractFlinkService.java#L827]
to the local storage of kubernetes pod , then [upload the jar to the job
manager through the `/jars/upload` rest api
|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/service/AbstractFlinkService.java#L842],
and finally call the `/jars/:jarid/run` rest api to launch the job.
In this process, the operator needs to first download the jar and then upload
the jar. When multiple jobs are submitted to the session cluster
simultaneously, the operator can become a single point of bottleneck, which may
be limited by the network traffic or other resource constraints of the operator
pod.
We can modify the job submission process in the session mode. The jobmanager
can provide a `/jars/run` rest api that supports self-downloading the job jar,
and the operator only needs to send a rest request to submit the job, without
download and upload the job jar. In this way, the submission pressure of the
operator can be distributed to each job manager.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)