[ 
https://issues.apache.org/jira/browse/HBASE-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867911#action_12867911
 ] 

stack commented on HBASE-2486:
------------------------------

@Eugene

Your xml comment instead should move into a "description" element.  See other 
configs. for the model:

{code}
+    <!-- how to deal with situations such as
+        inconsistent beliefs between master and a regionserver concerning a 
region's location
+        (HBASE-2486) 
+        Two values are supported now: 'paranoid' (shut master down) 
+         or 'lax' (mark region in question as unassigned).
+      -->
{code}

What happens if the config. is other than lax or paranoid: e.g. the user 
misspells the config?

As a general comment, you do not need to put HBASE-2486 on all your comments.  
Reading them, they are substantive enough w/o need of citing hbase-2486.

In checkNSRERegion, your comments start off w/ the number 3?

Do you need to do this: +    String nsreRegion = 
Bytes.toString(nsreMsg.getMessage());?  Can't you put the offending region into 
the HMsg as its HRegionInfo payload?  (The other HMsgs work this way)

You don't need to put method name in log message; e.g. checkNSRERegion().

Does this work?  +    if 
(master.regionManager.regionIsInTransition(nsreRegion)) {  Is 
regionsIntransition keyed by regionname as String  (I haven't looked).

Regards logging, I'd say logging 'normal' status -- e.g. that we're in 
'consistent' state -- is just going to make for noise in the logs.  Log the 
anomaly's only I'd say.

FYI, "// assumption: there is only one -ROOT- region, and it's called 
'-ROOT-,,0'" is not an assumption, its a fact.

Do you want to start the count at '0' instead?  +      int regionCount = 1;

Don't do this:

+      LOG.debug("NSRE exception came from region server       : " + 
nsreServerAddress.toString());
+      LOG.debug("according to regionManager, region server is : " + 
regionServerBelief);

Put them together in the one message.

Same for the Log.error later.

A log message that spans two log events is hard to grep for in logs.  One event 
per log is usually best.

Yeah, don't bother logging the 3.a consistent state.  Its 'normal' so doesn't 
warrant logging.

Regards this method generally, if you did stuff like pass in the meta map of 
online regions as a param, and made the method static, then you might be able 
to write unit tests to cover a good portion of what is done in this method.  
For the fetch against the meta table, you might consider mocking this using 
something like the mockito framework.  Unit tests would help here alot Eugene 
especially as this is you getting stuck in for the first time.  Unit tests will 
help you understand better what is being passed around.

I see now why you can't put an HRegionInfo as the second arg. to HMsg.  Rather 
than make a fake HRI, I'd say, make HMsg work w/ a null.... 

{code}
+    // HBASE 2486: 
+    // "2) when a region sends a heartbeat, include a message for each of 
these regions, 
+    //     MSG_REPORT_NSRE or somesuch, and then clear the set"
+    byte[] nsre_region;
+    // note that pollFirst() removes the first element from nsreSet as a 
side-effect of returning that element.
+    // 
http://java.sun.com/javase/6/docs/api/java/util/NavigableSet.html#pollFirst()
+    while ((nsre_region = nsreSet.pollFirst()) != null) {
+      // create an empty 'fakeRegion', since HMsg's second argument
+      // (an HRegionInfo) must not be null.
+      HRegionInfo fake_region = new HRegionInfo();
+      if (LOG.isDebugEnabled()) {
+        LOG.debug("sending HMsg(MSG_REPORT_NSRE,fake_region,'" + 
Bytes.toString(nsre_region) + "') to master..");
+      }
+      getOutboundMsgs().add(new HMsg(HMsg.Type.MSG_REPORT_NSRE, fake_region, 
nsre_region));
{code}

I also now see why the numbering of items.   You are trying to bring the 
comment from the issue over into the code.  I'd say leave that out.  Your 
comments stand by themselves w/o reference back to the issue.

Good stuff.

> Add simple "anti-entropy" for region assignment
> -----------------------------------------------
>
>                 Key: HBASE-2486
>                 URL: https://issues.apache.org/jira/browse/HBASE-2486
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: master, regionserver
>    Affects Versions: 0.20.5
>            Reporter: Todd Lipcon
>            Assignee: Eugene Koontz
>             Fix For: 0.21.0
>
>         Attachments: hbase2486.diff
>
>
> We've seen a number of bugs where a region server thinks it should not be 
> serving a region, but the master and META think it should be. I'd like to 
> propose a very simple way of fixing this issue:
> 1) whenever a regionserver throws a NotServingRegionException, it also marks 
> that region id in an RS-wide Set
> 2) when a region sends a heartbeat, include a message for each of these 
> regions, MSG_REPORT_NSRE or somesuch, and then clear the set
> 3) when the master receives MSG_REPORT_NSRE, it does the following checks:
> a) if the region is assigned elsewhere according to META, the NSRE was due to 
> a stale client, ignore
> b) if the region is in transition, ignore
> c) otherwise, we have an inconsistency, and we should take some steps to 
> resolve (eg mark the region unassigned, or exit the master if we are in 
> "paranoid mode")
> Whatever we do, we need to make sure that this is loudly logged, and causes 
> unit tests to fail, when it's detected. This should *not* happen, but when it 
> does, it would be good to recover without addtable.rb, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to