[
https://issues.apache.org/jira/browse/HBASE-543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658891#action_12658891
]
Jim Kellerman commented on HBASE-543:
-------------------------------------
@Andrew
Well the good news is that this problem prevented an inconsistent state in the
master, as ProcessRegionOpen would have updated the meta with the original
server when, in fact it was being to close that region.
The bad news, of course is that the region rebalancing did not work properly.
unassignSomeRegions should not choose regions that are unassigned, assigned or
pending.
@Stack
Yes, the lock on RegionManager is broad, however it was the only way I could
see to guard multiple operations that effect both the regionsInTransition map
and the onlineMetaRegions map, which happen in a couple of places. Separate
locks for regionsInTransition and onlineMetaRegions would be more deadlock
prone I thought. With this approach, every method that performs multiple
operations on either map either grabs the RegionManager's monitor or waits
while the current owner of the monitor does its thing and gets out. I don't
think I grab RegionManager's monitor over any long running operation, but I
will reverify that.
> A region's state is kept in several places in the master opening the
> possibility for race conditions
> ----------------------------------------------------------------------------------------------------
>
> Key: HBASE-543
> URL: https://issues.apache.org/jira/browse/HBASE-543
> Project: Hadoop HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.1.0, 0.1.1, 0.2.0
> Reporter: Jim Kellerman
> Assignee: Jim Kellerman
> Fix For: 0.19.0
>
> Attachments: 543.patch, 543.patch, 543.patch, 543.patch-4,
> 543.patch-5, apurtell-HMaster-20081223-1.log.zip, recent-changes.patch,
> regionstate.txt
>
>
> A region's state exists in multiple maps in the RegionManager:
> unassignedRegions, pendingRegions, regionsToClose, closingRegions,
> regionsToDelete, etc.
> One of these race conditions was found in HBASE-534.
> For HBase-0.1.x, we should just patch the holes we find.
> The ultimate solution (which requires a lot of changes in HMaster) should be
> applied to HBase trunk.
> Proposed solution:
> Create a class that encapsulates a region's state and provide synchronized
> access to the class that validates state changes.
> There should be a single structure that holds regions in these transitional
> states and it should be a synchronized collection of some kind.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.