[ 
https://issues.apache.org/jira/browse/FLINK-32756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiangyu feng updated FLINK-32756:
---------------------------------
    Description: 
In OLAP scenario, we submit queries to flink session cluster through the 
flink-sql-gateway service. When receiving queries, the gateway service will 
create sessions to handle the query, each session will create a new 
RestClusterClient and a new ClientHAServices.

 

In our production usage, we have enabled JobManager ZK HA and use 
ZooKeeperClientHAServices to do service discovery. Each ZKClientHAServices will 
establish a network connection with ZK and create four ZK related threads.

 

When QPS reaches 200, more than 1000 sessions are created in a single 
flink-sql-gateway instance, which means more than 1000 ZK connections and 4000 
ZK related threads are created simultaneously. This will raise a significant 
stability risk in production.

 

To address this problem, we have created SharedZKClientHAService for different 
sessions to share a ZK connection and ZKClient.

  was:
In OLAP scenario, we submit queries to flink session cluster through the 
flink-sql-gateway service. When receiving queries, the gateway service will 
create sessions to handle the query, each session will create a new 
RestClusterClient and a new ClientHAServices.

 

In our production usage, we have enabled JobManager ZK HA and use 
ZooKeeperClientHAServices to do service discovery. Each ZKClientHAServices will 
establish a network connection with ZK and create four ZK related threads.

 

When QPS reaches 200, more than 1000 sessions are created in a single 
flink-sql-gateway instance, which means more than 1000 ZK connections and more 
than 4000 ZK related threads are created simultaneously. This will raise a 
significant stability risk in production.

 

To address this problem, we have created SharedZKClientHAService for different 
sessions to share a ZK connection and ZKClient.


> Reuse ClientHighAvailabilityServices in RestClusterClient when submitting 
> OLAP jobs
> -----------------------------------------------------------------------------------
>
>                 Key: FLINK-32756
>                 URL: https://issues.apache.org/jira/browse/FLINK-32756
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Client / Job Submission
>            Reporter: xiangyu feng
>            Priority: Major
>
> In OLAP scenario, we submit queries to flink session cluster through the 
> flink-sql-gateway service. When receiving queries, the gateway service will 
> create sessions to handle the query, each session will create a new 
> RestClusterClient and a new ClientHAServices.
>  
> In our production usage, we have enabled JobManager ZK HA and use 
> ZooKeeperClientHAServices to do service discovery. Each ZKClientHAServices 
> will establish a network connection with ZK and create four ZK related 
> threads.
>  
> When QPS reaches 200, more than 1000 sessions are created in a single 
> flink-sql-gateway instance, which means more than 1000 ZK connections and 
> 4000 ZK related threads are created simultaneously. This will raise a 
> significant stability risk in production.
>  
> To address this problem, we have created SharedZKClientHAService for 
> different sessions to share a ZK connection and ZKClient.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to