[ 
https://issues.apache.org/jira/browse/FLINK-32756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiangyu feng updated FLINK-32756:
---------------------------------
    Description: 
In OLAP scenario, we submit queries to flink session cluster through the 
flink-sql-gateway service. When receiving queries, the gateway service will 
create sessions to handle the query, each session will create a new 
RestClusterClient to submit queries and a new ClientHAServices to discover the 
latest address of the JobManager.

In our production usage, we have enabled JobManager HA and use 
ZKClientHAServices to do service discovery. Each ZKClientHAServices will 
establish a network connection with ZK and create four ZK related threads. 

When QPS reaches 200, more than 1000 sessions are created in a single 
flink-sql-gateway instance, which means more than 1000 ZK connections and more 
than 4000 ZK related threads are created simultaneously. This will raise a 
significant stability risk in production.

> Reues ZK connections when submitting OLAP jobs to Flink session cluster
> -----------------------------------------------------------------------
>
>                 Key: FLINK-32756
>                 URL: https://issues.apache.org/jira/browse/FLINK-32756
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Client / Job Submission
>            Reporter: xiangyu feng
>            Priority: Major
>
> In OLAP scenario, we submit queries to flink session cluster through the 
> flink-sql-gateway service. When receiving queries, the gateway service will 
> create sessions to handle the query, each session will create a new 
> RestClusterClient to submit queries and a new ClientHAServices to discover 
> the latest address of the JobManager.
> In our production usage, we have enabled JobManager HA and use 
> ZKClientHAServices to do service discovery. Each ZKClientHAServices will 
> establish a network connection with ZK and create four ZK related threads. 
> When QPS reaches 200, more than 1000 sessions are created in a single 
> flink-sql-gateway instance, which means more than 1000 ZK connections and 
> more than 4000 ZK related threads are created simultaneously. This will raise 
> a significant stability risk in production.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to