[jira] [Updated] (FLINK-33683) Improve the performance of submitting jobs and fetching results to a running flink cluster

xiangyu feng (Jira) Wed, 29 Nov 2023 10:23:06 -0800


     [ 
https://issues.apache.org/jira/browse/FLINK-33683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


xiangyu feng updated FLINK-33683:
---------------------------------
    Description: 
Now there are lots of unnecessary overhead involved in submitting jobs and 
fetching results to a long-running flink cluster. This works well for streaming 
and batch job, because in these scenarios users will not frequently submit jobs 
and fetch result to a running cluster.

 

But in OLAP scenario, users will continuously submit lots of short-lived jobs 
to the running cluster. In this situation, these overhead will have a huge 
impact on the E2E performance.  Here are some examples of unnecessary overhead:
 * Each `RemoteExecutor` will create a new `StandaloneClusterDescriptor` when 
executing a job on the same remote cluster
 * `StandaloneClusterDescriptor` will always create a new `RestClusterClient` 
when retrieving an existing Flink Cluster
 * Each `RestClusterClient` will create a new `ClientHighAvailabilityServices` 
which might contains a resource-consuming ha client(ZKClient or KubeClient) and 
a time-consuming leader retrieval operation
 * `RestClient` will create a new connection for every request which costs 
extra connection establishment time

 

Therefore, I suggest creating this ticket and following subtasks to improve 
this performance. This ticket is also relates to  FLINK-25318.

  was:
There is now a lot of unnecessary overhead involved in submitting jobs and 
fetching results to a long-running flink cluster. This works well for streaming 
and batch job, because in these scenarios users will not frequently submit jobs 
and fetch result to a running cluster.

 

But in OLAP scenario, users will continuously submit lots of short-lived jobs 
to the running cluster. In this situation, these overhead will have a huge 
impact on the E2E performance.  Here are some examples of unnecessary overhead:
 * Each `RemoteExecutor` will create a new `StandaloneClusterDescriptor` when 
executing a job on the same remote cluster
 * `StandaloneClusterDescriptor` will always create a new `RestClusterClient` 
when retrieving an existing Flink Cluster
 * Each `RestClusterClient` will create a new `ClientHighAvailabilityServices` 
which might contains a resource-consuming ha client(ZKClient or KubeClient) and 
a time-consuming leader retrieval operation
 * `RestClient` will create a new connection for every request which costs 
extra connection establishment time

 

Therefore, I suggest creating this ticket and following subtasks to improve 
this performance. This ticket is also relates to  FLINK-25318.


> Improve the performance of submitting jobs and fetching results to a running 
> flink cluster
> ------------------------------------------------------------------------------------------
>
>                 Key: FLINK-33683
>                 URL: https://issues.apache.org/jira/browse/FLINK-33683
>             Project: Flink
>          Issue Type: Improvement
>          Components: Client / Job Submission, Table SQL / Client
>            Reporter: xiangyu feng
>            Priority: Major
>
> Now there are lots of unnecessary overhead involved in submitting jobs and 
> fetching results to a long-running flink cluster. This works well for 
> streaming and batch job, because in these scenarios users will not frequently 
> submit jobs and fetch result to a running cluster.
>  
> But in OLAP scenario, users will continuously submit lots of short-lived jobs 
> to the running cluster. In this situation, these overhead will have a huge 
> impact on the E2E performance.  Here are some examples of unnecessary 
> overhead:
>  * Each `RemoteExecutor` will create a new `StandaloneClusterDescriptor` when 
> executing a job on the same remote cluster
>  * `StandaloneClusterDescriptor` will always create a new `RestClusterClient` 
> when retrieving an existing Flink Cluster
>  * Each `RestClusterClient` will create a new 
> `ClientHighAvailabilityServices` which might contains a resource-consuming ha 
> client(ZKClient or KubeClient) and a time-consuming leader retrieval operation
>  * `RestClient` will create a new connection for every request which costs 
> extra connection establishment time
>  
> Therefore, I suggest creating this ticket and following subtasks to improve 
> this performance. This ticket is also relates to  FLINK-25318.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-33683) Improve the performance of submitting jobs and fetching results to a running flink cluster

Reply via email to