[
https://issues.apache.org/jira/browse/HBASE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020471#comment-13020471
]
Jean-Daniel Cryans commented on HBASE-3789:
-------------------------------------------
I might also add that with this patch, when it usually took 25 seconds to run
the balancer command it now returns under 1 second.
> Cleanup the locking contention in the master
> --------------------------------------------
>
> Key: HBASE-3789
> URL: https://issues.apache.org/jira/browse/HBASE-3789
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 0.90.2
> Reporter: Jean-Daniel Cryans
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: HBASE-3789.patch
>
>
> The new master uses a lot of synchronized blocks to be safe, but it only
> takes a few jstacks to see that there's multiple layers of lock contention
> when a bunch of regions are moving (like when the balancer runs). The main
> culprits are regionInTransition in AssignmentManager, ZKAssign that uses
> ZKW.getZNnodes (basically another set of region in transitions), and locking
> at the RegionState level.
> My understanding is that even tho we have multiple threads to handle regions
> in transition, everything is actually serialized. Most of the time, lock
> holders are talking to ZK or a region server, which can take a few
> milliseconds.
> A simple example is when AssignmentManager wants to update the timers for all
> the regions on a RS, it will usually be waiting on another thread that's
> holding the lock while talking to ZK.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira