[
https://issues.apache.org/jira/browse/HBASE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039368#comment-13039368
]
Jean-Daniel Cryans commented on HBASE-3789:
-------------------------------------------
There's one major issue with my current patch and it's that there's a race
between the master's OpenedRegionHandler and the events thread. It goes like
this:
- RS transitions a region to OPENING
- RS transitions again to OPENING
- Master receives the first event, reads ZK and sees OPENING
- RS transitions to OPENED
- Master receives the second event, reads ZK and sees OPENED instead of
OPENING, kicks of the OpenedRegionHandler
- The handler will at some point delete the znode in the ZKW.getNodes
structure (such a bad method name) before deleting the actual znode
- Master receives the third event, reads ZK, sees OPENED but finds that
getNodes doesn't contain the znode and considers this as a new region in
transition so it adds it back in getNodes()
- The handler deletes the znode
- The Master does a no-op when it figures it cannot transition from OPEN to
OPENED
At this point the region is assigned and everything is "fine"... until the
master decides for any reason to unassign the region. It sends the
unassignment, receives an event but doesn't process it in nodeChildrenChanged
because ZKW.getNodes() already has it. From the point the master will spin in
"Region has been PENDING_CLOSE for too long" until it's put out of its misery.
The issue here is that the region server is creating the unassigned znode by
itself, unlike an assignment where it's the master that does it. Doing that in
the master won't fully solve the issue tho because in 0.92 the RS still create
znodes for splits and there's no way that could be done by the master is it
would be basically like returning back to how it used to work.
So this is what Stack and I thought about:
- The master needs to create the unassigned znode before telling a RS to close
a region, the RS will now just update it
- ZKW needs to stop keeping track of the znodes, getting into a situation
where we have a mismatch is too easy
- The SplitTransaction will still create the znode, but it will then wait at
the very end until it gets deleted by the master. To make sure the master sees
the change, it will tickle the znode like we do for OPENING so that the master
doesn't miss it
- The method AssignmentManager.nodeChildrenChanged will only put watchers on
znodes and won't keep track of anything
> Cleanup the locking contention in the master
> --------------------------------------------
>
> Key: HBASE-3789
> URL: https://issues.apache.org/jira/browse/HBASE-3789
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 0.90.2
> Reporter: Jean-Daniel Cryans
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: HBASE-3789.patch
>
>
> The new master uses a lot of synchronized blocks to be safe, but it only
> takes a few jstacks to see that there's multiple layers of lock contention
> when a bunch of regions are moving (like when the balancer runs). The main
> culprits are regionInTransition in AssignmentManager, ZKAssign that uses
> ZKW.getZNnodes (basically another set of region in transitions), and locking
> at the RegionState level.
> My understanding is that even tho we have multiple threads to handle regions
> in transition, everything is actually serialized. Most of the time, lock
> holders are talking to ZK or a region server, which can take a few
> milliseconds.
> A simple example is when AssignmentManager wants to update the timers for all
> the regions on a RS, it will usually be waiting on another thread that's
> holding the lock while talking to ZK.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira