[ 
https://issues.apache.org/jira/browse/HBASE-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868864#action_12868864
 ] 

Eugene Koontz commented on HBASE-2486:
--------------------------------------

I forgot to address this:

"Does this work? + if (master.regionManager.regionIsInTransition(nsreRegion)) { 
Is regionsIntransition keyed by regionname as String (I haven't looked)."

This works in my testing.

"FYI, "// assumption: there is only one ROOT region, and it's called 'ROOT,,0'" 
is not an assumption, its a fact."

Changed comment to reflect this.

"Do you want to start the count at '0' instead? + int regionCount = 1;"

Removed this variable regionCount, since it was only used for non-anomalous 
logging (for my development purposes).



> Add simple "anti-entropy" for region assignment
> -----------------------------------------------
>
>                 Key: HBASE-2486
>                 URL: https://issues.apache.org/jira/browse/HBASE-2486
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: master, regionserver
>    Affects Versions: 0.20.5
>            Reporter: Todd Lipcon
>            Assignee: Eugene Koontz
>             Fix For: 0.21.0
>
>         Attachments: hbase2486.diff, hbase2486.diff
>
>
> We've seen a number of bugs where a region server thinks it should not be 
> serving a region, but the master and META think it should be. I'd like to 
> propose a very simple way of fixing this issue:
> 1) whenever a regionserver throws a NotServingRegionException, it also marks 
> that region id in an RS-wide Set
> 2) when a region sends a heartbeat, include a message for each of these 
> regions, MSG_REPORT_NSRE or somesuch, and then clear the set
> 3) when the master receives MSG_REPORT_NSRE, it does the following checks:
> a) if the region is assigned elsewhere according to META, the NSRE was due to 
> a stale client, ignore
> b) if the region is in transition, ignore
> c) otherwise, we have an inconsistency, and we should take some steps to 
> resolve (eg mark the region unassigned, or exit the master if we are in 
> "paranoid mode")
> Whatever we do, we need to make sure that this is loudly logged, and causes 
> unit tests to fail, when it's detected. This should *not* happen, but when it 
> does, it would be good to recover without addtable.rb, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to