[ 
https://issues.apache.org/jira/browse/HBASE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039368#comment-13039368
 ] 

Jean-Daniel Cryans commented on HBASE-3789:
-------------------------------------------

There's one major issue with my current patch and it's that there's a race 
between the master's OpenedRegionHandler and the events thread. It goes like 
this:

 - RS transitions a region to OPENING
 - RS transitions again to OPENING
 - Master receives the first event, reads ZK and sees OPENING
 - RS transitions to OPENED
 - Master receives the second event, reads ZK and sees OPENED instead of 
OPENING, kicks of the OpenedRegionHandler
 - The handler will at some point delete the znode in the ZKW.getNodes 
structure (such a bad method name) before deleting the actual znode
 - Master receives the third event, reads ZK, sees OPENED but finds that 
getNodes doesn't contain the znode and considers this as a new region in 
transition so it adds it back in getNodes()
 - The handler deletes the znode
 - The Master does a no-op when it figures it cannot transition from OPEN to 
OPENED

At this point the region is assigned and everything is "fine"... until the 
master decides for any reason to unassign the region. It sends the 
unassignment, receives an event but doesn't process it in nodeChildrenChanged 
because ZKW.getNodes() already has it. From the point the master will spin in 
"Region has been PENDING_CLOSE for too long" until it's put out of its misery.

The issue here is that the region server is creating the unassigned znode by 
itself, unlike an assignment where it's the master that does it. Doing that in 
the master won't fully solve the issue tho because in 0.92 the RS still create 
znodes for splits and there's no way that could be done by the master is it 
would be basically like returning back to how it used to work.

So this is what Stack and I thought about:

 - The master needs to create the unassigned znode before telling a RS to close 
a region, the RS will now just update it
 - ZKW needs to stop keeping track of the znodes, getting into a situation 
where we have a mismatch is too easy
 - The SplitTransaction will still create the znode, but it will then wait at 
the very end until it gets deleted by the master. To make sure the master sees 
the change, it will tickle the znode like we do for OPENING so that the master 
doesn't miss it
 - The method AssignmentManager.nodeChildrenChanged will only put watchers on 
znodes and won't keep track of anything

> Cleanup the locking contention in the master
> --------------------------------------------
>
>                 Key: HBASE-3789
>                 URL: https://issues.apache.org/jira/browse/HBASE-3789
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.2
>            Reporter: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3789.patch
>
>
> The new master uses a lot of synchronized blocks to be safe, but it only 
> takes a few jstacks to see that there's multiple layers of lock contention 
> when a bunch of regions are moving (like when the balancer runs). The main 
> culprits are regionInTransition in AssignmentManager, ZKAssign that uses 
> ZKW.getZNnodes (basically another set of region in transitions), and locking 
> at the RegionState level. 
> My understanding is that even tho we have multiple threads to handle regions 
> in transition, everything is actually serialized. Most of the time, lock 
> holders are talking to ZK or a region server, which can take a few 
> milliseconds.
> A simple example is when AssignmentManager wants to update the timers for all 
> the regions on a RS, it will usually be waiting on another thread that's 
> holding the lock while talking to ZK.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to