[
https://issues.apache.org/jira/browse/FLINK-32756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
xiangyu feng updated FLINK-32756:
---------------------------------
Summary: Reuse ClientHighAvailabilityServices in RestClusterClient when
submitting OLAP jobs (was: Reues ZK connections when submitting OLAP jobs to
Flink session cluster)
> Reuse ClientHighAvailabilityServices in RestClusterClient when submitting
> OLAP jobs
> -----------------------------------------------------------------------------------
>
> Key: FLINK-32756
> URL: https://issues.apache.org/jira/browse/FLINK-32756
> Project: Flink
> Issue Type: Sub-task
> Components: Client / Job Submission
> Reporter: xiangyu feng
> Priority: Major
>
> In OLAP scenario, we submit queries to flink session cluster through the
> flink-sql-gateway service. When receiving queries, the gateway service will
> create sessions to handle the query, each session will create a new
> RestClusterClient to submit queries and a new ClientHAServices to discover
> the latest address of the JobManager.
> In our production usage, we have enabled JobManager HA and use
> ZKClientHAServices to do service discovery. Each ZKClientHAServices will
> establish a network connection with ZK and create four ZK related threads.
> When QPS reaches 200, more than 1000 sessions are created in a single
> flink-sql-gateway instance, which means more than 1000 ZK connections and
> more than 4000 ZK related threads are created simultaneously. This will raise
> a significant stability risk in production.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)