[ 
https://issues.apache.org/jira/browse/KUDU-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved KUDU-2193.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 1.6.0

I fixed this in 1.6

> Severe lock contention on TSTabletManager lock
> ----------------------------------------------
>
>                 Key: KUDU-2193
>                 URL: https://issues.apache.org/jira/browse/KUDU-2193
>             Project: Kudu
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.6.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 1.6.0
>
>
> I'm doing some stress/failure testing on a cluster with lots of tablets and 
> ran into the following mess:
> - TSTabletManager::GenerateIncrementalTabletReport is holding the 
> TSTabletManager lock in 'read' mode
> -- it's calling CreateReportedTabletPB on a bunch of tablets which are in the 
> process of an election storm
> -- each such call blocks in RaftConsensus::ConsensusState since it's in the 
> process of fsyncing metadata to disk
> -- thus it's holding the read lock on TSTabletManager lock for a long time 
> (many seconds if not tens of seconds)
> - meanwhile, some other thread is trying to take TSTabletManager::lock for 
> write, and blocked due to the above reader
> - rw_spinlock is writer-starvation-free which means that no more readers can 
> acquire the lock
> What's worse is that rw_spinlock is a true spin lock, so now there are tens 
> of threads in a 'while (true) sched_yield()' loop, generating over 1.5M 
> context switches per second.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to