[jira] [Updated] (FLINK-34728) operator does not need to upload and download the jar when deploying session job

Fei Feng (Jira) Tue, 19 Mar 2024 01:04:24 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-34728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Fei Feng updated FLINK-34728:
-----------------------------
    Description: 
Problem:

By reading the source code of the sessionjob's first reconcilition in the 
session mode of the flink kubernetes operator, a clear single point of 
bottleneck can be identified. When submitting a session job, the operator needs 
to first download the job jar from the jarURL to the local storage of 
kubernetes pod , then upload the jar to the job manager through the 
`/jars/upload` rest api, and finally call the `/jars/:jarid/run` rest api to 
launch the job.

In this process, the operator needs to first download the jar and then upload 
the jar. When multiple jobs are submitted to the session cluster 
simultaneously, the operator can become a single point of bottleneck, which may 
be limited by the network traffic or other resource constraints of the operator 
pod.

 

[https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/service/AbstractFlinkService.java#L824]

!image-2024-03-19-15-59-20-933.png|width=548,height=432!

 

Solution:

We can modify the job submission process in the session mode. The jobmanager 
can provide a `/jars/run` rest api that supports self-downloading the job jar, 
and the operator only needs to send a rest request to submit the job, without 
download and upload the job jar. In this way, the submission pressure of the 
operator can be distributed to each job manager. 

  was:
By reading the source code of the sessionjob's first reconcilition in the 
session mode of the flink kubernetes operator, a clear single point of 
bottleneck can be identified. When submitting a session job, the operator needs 
to first [download the job jar from the 
jarURL|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/service/AbstractFlinkService.java#L827]
 to the local storage of kubernetes pod , then [upload the jar to the job 
manager through the `/jars/upload` rest api 
|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/service/AbstractFlinkService.java#L842],
 and finally call the `/jars/:jarid/run` rest api to launch the job.

In this process, the operator needs to first download the jar and then upload 
the jar. When multiple jobs are submitted to the session cluster 
simultaneously, the operator can become a single point of bottleneck, which may 
be limited by the network traffic or other resource constraints of the operator 
pod.

We can modify the job submission process in the session mode. The jobmanager 
can provide a `/jars/run` rest api that supports self-downloading the job jar, 
and the operator only needs to send a rest request to submit the job, without 
download and upload the job jar. In this way, the submission pressure of the 
operator can be distributed to each job manager. 


> operator does not need to upload and download the jar when deploying session 
> job
> --------------------------------------------------------------------------------
>
>                 Key: FLINK-34728
>                 URL: https://issues.apache.org/jira/browse/FLINK-34728
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / Kubernetes, Kubernetes Operator
>    Affects Versions: kubernetes-operator-1.5.0, kubernetes-operator-1.6.0, 
> kubernetes-operator-1.7.0
>            Reporter: Fei Feng
>            Priority: Major
>         Attachments: image-2024-03-19-15-59-20-933.png
>
>
> Problem:
> By reading the source code of the sessionjob's first reconcilition in the 
> session mode of the flink kubernetes operator, a clear single point of 
> bottleneck can be identified. When submitting a session job, the operator 
> needs to first download the job jar from the jarURL to the local storage of 
> kubernetes pod , then upload the jar to the job manager through the 
> `/jars/upload` rest api, and finally call the `/jars/:jarid/run` rest api to 
> launch the job.
> In this process, the operator needs to first download the jar and then upload 
> the jar. When multiple jobs are submitted to the session cluster 
> simultaneously, the operator can become a single point of bottleneck, which 
> may be limited by the network traffic or other resource constraints of the 
> operator pod.
>  
> [https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/service/AbstractFlinkService.java#L824]
> !image-2024-03-19-15-59-20-933.png|width=548,height=432!
>  
> Solution:
> We can modify the job submission process in the session mode. The jobmanager 
> can provide a `/jars/run` rest api that supports self-downloading the job 
> jar, and the operator only needs to send a rest request to submit the job, 
> without download and upload the job jar. In this way, the submission pressure 
> of the operator can be distributed to each job manager. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-34728) operator does not need to upload and download the jar when deploying session job

Reply via email to