[
https://issues.apache.org/jira/browse/IMPALA-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sailesh Mukil resolved IMPALA-4456.
-----------------------------------
Resolution: Fixed
Fix Version/s: Impala 2.12.0
> Address scalability issues of qs_map_lock_ and client_request_state_map_lock_
> -----------------------------------------------------------------------------
>
> Key: IMPALA-4456
> URL: https://issues.apache.org/jira/browse/IMPALA-4456
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Affects Versions: Impala 2.7.0
> Reporter: Sailesh Mukil
> Assignee: Sailesh Mukil
> Priority: Major
> Labels: performance, scalability
> Fix For: Impala 2.12.0
>
>
> The process wide client_request_state_map_lock_ can be a highly contended
> lock. It is used as a mutex currently.
> Changing it to a reader-writer lock and using it as a writer lock strictly
> only when necessary can give us some big wins with relatively less effort.
> On running some tests, I've also noticed that changing it to a RW lock
> reduces the number of clients created for use between nodes. The reason being
> ReportExecStatus() that runs in the context of a thrift connection thread,
> waits for fairly long periods of time for the client_request_state_map_lock_
> on high load systems, causing the RPC sender to be blocked on the RPC. This
> in turn requires other threads on the sender node to create new client
> connections to send RPCs instead of reusing old ones, which contribute to
> nodes hitting their connection limit.
> So, this relatively small change can give us wins not just in terms of
> performance, but scalability too.
> EDIT: Trying a WIP patch with RW locks on a larger cluster has showed that it
> works well (very well) in some cases, but regresses at other times. The
> reason being, we can't decide wether to starve readers or writers, and
> performance varies (drastically) according to the workload and the starve
> option. We need to come up with other ideas to address this issue (A
> suggestion is to have some sort of bucketed locking scheme, where we split
> the query_exec_state_map_ into buckets where each bucket has its own lock,
> thereby reducing lock contention).
> EDIT 2 (10/23/17): A simple approach that's tried and tested is to shard the
> lock and the map it protects, so it allows for better parallel accesses to
> the client request states. Also, more recent perf runs have showed
> qs_map_lock_ to be a frequent point of contention too. So, we're accounting
> for that in this JIRA too.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)