[
https://issues.apache.org/jira/browse/IMPALA-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16687018#comment-16687018
]
Michael Ho commented on IMPALA-4069:
------------------------------------
Now that IMPALA-7213 and IMPALA-4063 are fixed, the number of connections
should scale linearly with # of hosts instead of (# of hosts x # of query
fragments per host). May still be good to study whether this is still an issue
in a Kerberos enabled cluster or should we eagerly do a staggered warm up in
the Impala cluster to pre-create the connections ?
> Introduce startup option to create and cache backend connections on startup
> ---------------------------------------------------------------------------
>
> Key: IMPALA-4069
> URL: https://issues.apache.org/jira/browse/IMPALA-4069
> Project: IMPALA
> Issue Type: Bug
> Components: Distributed Exec
> Affects Versions: Impala 2.5.0
> Reporter: Mostafa Mokhtar
> Priority: Major
> Labels: scalability
>
> Add impalad startup flag specifying the number of connections per backend to
> create and cache.
> After startup impala-server.backends.client-cache.total-clients should
> reflect number of backends x cached connections per backend.
> [[email protected]] description of the problem
> {code}
> Internal Impala network connections between nodes for query execution are not
> multiplexed. This means as the number of queries increase the number of
> network connections increases between Impala executors. With higher #nodes,
> the combination of query bursts and number of executors can lead to lots of
> new connections attempts. For example, a query with 10+joins on a 100-node
> cluster could require 1000+ connections simultaneously on coordinator. When
> the spike is too high or if there is not sufficient CPU available to handle
> the bursts, this causes connection failures.
> The total number of connections does not seem to be the issue, but there is
> currently a practical limit on the number of simultaneous new concurrent
> connection TCP request spikes at once.
> Impala caches backend connections and reuse them later. With cache, the
> simultaneous spikes of new connection request is only those above previous
> established maximum.
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]