[ 
https://issues.apache.org/jira/browse/IMPALA-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sailesh Mukil resolved IMPALA-4456.
-----------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.12.0

> Address scalability issues of qs_map_lock_ and client_request_state_map_lock_
> -----------------------------------------------------------------------------
>
>                 Key: IMPALA-4456
>                 URL: https://issues.apache.org/jira/browse/IMPALA-4456
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 2.7.0
>            Reporter: Sailesh Mukil
>            Assignee: Sailesh Mukil
>            Priority: Major
>              Labels: performance, scalability
>             Fix For: Impala 2.12.0
>
>
> The process wide client_request_state_map_lock_ can be a highly contended 
> lock. It is used as a mutex currently.
> Changing it to a reader-writer lock and using it as a writer lock strictly 
> only when necessary can give us some big wins with relatively less effort.
> On running some tests, I've also noticed that changing it to a RW lock 
> reduces the number of clients created for use between nodes. The reason being 
> ReportExecStatus() that runs in the context of a thrift connection thread, 
> waits for fairly long periods of time for the client_request_state_map_lock_ 
> on high load systems, causing the RPC sender to be blocked on the RPC. This 
> in turn requires other threads on the sender node to create new client 
> connections to send RPCs instead of reusing old ones, which contribute to 
> nodes hitting their connection limit.
> So, this relatively small change can give us wins not just in terms of 
> performance, but scalability too.
> EDIT: Trying a WIP patch with RW locks on a larger cluster has showed that it 
> works well (very well) in some cases, but regresses at other times. The 
> reason being, we can't decide wether to starve readers or writers, and 
> performance varies (drastically) according to the workload and the starve 
> option. We need to come up with other ideas to address this issue (A 
> suggestion is to have some sort of bucketed locking scheme, where we split 
> the query_exec_state_map_ into buckets where each bucket has its own lock, 
> thereby reducing lock contention).
> EDIT 2 (10/23/17): A simple approach that's tried and tested is to shard the 
> lock and the map it protects, so it allows for better parallel accesses to 
> the client request states. Also, more recent perf runs have showed 
> qs_map_lock_ to be a frequent point of contention too. So, we're accounting 
> for that in this JIRA too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to