[ 
https://issues.apache.org/jira/browse/HBASE-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koontz updated HBASE-2486:
---------------------------------

    Attachment: hbase2486.diff

Hi Jonathan and Stack,

Thanks again for your comments. I have created a new patch, against trunk, 
which addresses some of Stack's review comments, which I address also here in 
the comment to the attachment:


"Your xml comment instead should move into a "description" element. See other 
configs. for the model:"

Fixed now in hbase-default.xml in attached patch: xml comment text moved to 
<description> element.

"What happens if the config. is other than lax or paranoid: e.g. the user 
misspells the config?"

I decided to treat misspellings (neither "lax" nor "paranoid") as the same as 
the default, which is "lax", except that I put a LOG.warn() to alert the 
administrator of this fact.

"As a general comment, you do not need to put HBASE-2486 on all your comments. 
Reading them, they are substantive enough w/o need of citing hbase-2486."

"You don't need to put method name in log message; e.g. checkNSRERegion()."

Removed most of these, except for two "(See HBASE-2486)" which I thought might 
be useful in keeping with other "(See HBASE-XXXX)" existing comments.

"A log message that spans two log events is hard to grep for in logs. One event 
per log is usually best."

"don't bother logging the 3.a consistent state. Its 'normal' so doesn't warrant 
logging."

Removed multiple-line-spanning log events; replaced with single line log events 
or simply removed in case of non-anomalous events.

"Rather than make a fake HRI, I'd say, make HMsg work w/ a null...."

Have not yet addressed this : I plan to do so after writing some unit tests, 
which this bug still lacks. 

"I also now see why the numbering of items. You are trying to bring the comment 
from the issue over into the code. I'd say leave that out. Your comments stand 
by themselves w/o reference back to the issue."

Removed issue-referencing numbering from comments; made comments stand on their 
own.


> Add simple "anti-entropy" for region assignment
> -----------------------------------------------
>
>                 Key: HBASE-2486
>                 URL: https://issues.apache.org/jira/browse/HBASE-2486
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: master, regionserver
>    Affects Versions: 0.20.5
>            Reporter: Todd Lipcon
>            Assignee: Eugene Koontz
>             Fix For: 0.21.0
>
>         Attachments: hbase2486.diff, hbase2486.diff
>
>
> We've seen a number of bugs where a region server thinks it should not be 
> serving a region, but the master and META think it should be. I'd like to 
> propose a very simple way of fixing this issue:
> 1) whenever a regionserver throws a NotServingRegionException, it also marks 
> that region id in an RS-wide Set
> 2) when a region sends a heartbeat, include a message for each of these 
> regions, MSG_REPORT_NSRE or somesuch, and then clear the set
> 3) when the master receives MSG_REPORT_NSRE, it does the following checks:
> a) if the region is assigned elsewhere according to META, the NSRE was due to 
> a stale client, ignore
> b) if the region is in transition, ignore
> c) otherwise, we have an inconsistency, and we should take some steps to 
> resolve (eg mark the region unassigned, or exit the master if we are in 
> "paranoid mode")
> Whatever we do, we need to make sure that this is loudly logged, and causes 
> unit tests to fail, when it's detected. This should *not* happen, but when it 
> does, it would be good to recover without addtable.rb, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to