[ 
https://issues.apache.org/jira/browse/HBASE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-3789:
--------------------------------------

    Attachment: HBASE-3789-trunk.patch

Patch for trunk with the split fixes. I had to remove a test because it wasn't 
an issue anymore (the master now creates the znode when closing a region), then 
I had to do a bunch of fixes for AssignmentManager for cases when we report 
regions that are already split or skipped steps and finally I added the part of 
the code that waits for the master to delete the znode.

One thing I might do further cleanup on is the latter part of SplitTransaction 
that has a few methods that all look the same. Also I'm not thrilled having to 
do a sleep to wait on the master, but that was the easiest way.

> Cleanup the locking contention in the master
> --------------------------------------------
>
>                 Key: HBASE-3789
>                 URL: https://issues.apache.org/jira/browse/HBASE-3789
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.2
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3789-trunk.patch, HBASE-3789-v3-0.90.patch
>
>
> The new master uses a lot of synchronized blocks to be safe, but it only 
> takes a few jstacks to see that there's multiple layers of lock contention 
> when a bunch of regions are moving (like when the balancer runs). The main 
> culprits are regionInTransition in AssignmentManager, ZKAssign that uses 
> ZKW.getZNnodes (basically another set of region in transitions), and locking 
> at the RegionState level. 
> My understanding is that even tho we have multiple threads to handle regions 
> in transition, everything is actually serialized. Most of the time, lock 
> holders are talking to ZK or a region server, which can take a few 
> milliseconds.
> A simple example is when AssignmentManager wants to update the timers for all 
> the regions on a RS, it will usually be waiting on another thread that's 
> holding the lock while talking to ZK.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to