[jira] Updated: (HBASE-1104) Doubly-assigned regions redux

Jim Kellerman (JIRA) Thu, 08 Jan 2009 14:23:22 -0800

     [ 
https://issues.apache.org/jira/browse/HBASE-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jim Kellerman updated HBASE-1104:
---------------------------------

    Attachment: 1104.patch.1

> stack - 07/Jan/09 08:42 PM
> Did you mean to add in changes to Index: src/webapps/master/WEB-INF/web.xml?

No, and I'm not sure how it got changed. Reverted.

> Want to add more javadoc to the @return in below (Not important...)
{code}
Index: src/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
===================================================================
--- src/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java (revision 732591)
+++ src/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java (working copy)
@@ -126,6 +126,7 @@

    * @param regionName name of the region to update
    * @param b BatchUpdate
    * @param expectedValues map of column names to expected data values.
      + * @return true if
{code}

Done. It was missing the @return altogether, and I just forgot to finish the
comment.

Tell me about this change:

{code}
storedInfo = this.master.serverManager.getServerInfo(serverName);
deadServer = this.master.serverManager.isDead(serverName);

    * deadServerAndLogsSplit =
    * this.master.serverManager.isDeadServerLogsSplit(serverName);

and...

    * if ((deadServerAndLogsSplit ||
    * (!deadServer && (storedInfo == null ||
    * (storedInfo.getStartCode() != startCode)))) &&
    * this.regionManager.assignable(info)) {
      + if ((deadServer ||
      + (storedInfo == null || storedInfo.getStartCode() != startCode))) {
      +
{code}

> It don't look right. Changes I made for 1099 were "allow assigning if
> its a dead server and its commit logs HAVE been split" or "if NOT a
> dead server....because if a dead server and didn't pass first test,
> then its logs are being split.." ... We don't want BaseScanner
> assigning to servers on dead list. If regions are assigned to server
> on dead list, when dead server runs its scan in shutdown handler,
> we'll reassign these regions as though they'd been on crashed server;
> makes for double assignment and a mess.

You're right. It was a half finished change. What I meant to do was
not assign regions that are offline, in transition or were assigned to
a dead server since ProcessServerShutdown does that.

> You also remove the new method assignable. Don't we want to check if
> region is 'assignable' before dropping into this assigning code block?
> (Not sure... so asking).

If we get this far, we know the region is assignable because of the
test above.

> Your patch does this which as discussed on IRC is not whats wanted:
{code}
@@ -1088,12 +1088,8 @@
       byte [] closestKey = store.getRowKeyAtOrBefore(row);
       // If it happens to be an exact match, we can stop looping.
       // Otherwise, we need to check if it's the max and move to the next
-      if (HStoreKey.equalsTwoRowKeys(regionInfo, row, closestKey)) {
+      if (closestKey != null) {
         key = new HStoreKey(closestKey, this.regionInfo);
-      } else if (closestKey != null &&
-          (key == null || HStoreKey.compareTwoRowKeys(
-              regionInfo,closestKey, key.getRow()) > 0) ) {
-        key = new HStoreKey(closestKey, this.regionInfo);
       } else {
         return null;
       }
{code}

After some discussion with Stack, we determined that neither
implementation was correct. The new code is:
{code}
      // get the closest key. (HStore.getRowKeyAtOrBefore can return null)
      byte [] closestKey = store.getRowKeyAtOrBefore(row);
      // If it happens to be an exact match, we can stop.
      // Otherwise, we need to check if it's the max and move to the next
      if (closestKey != null) {
        if (HStoreKey.equalsTwoRowKeys(regionInfo, row, closestKey)) {
          key = new HStoreKey(closestKey, this.regionInfo);
        }
        if (key == null) {
          key = new HStoreKey(closestKey, this.regionInfo);
        }
      }
      if (key == null) {
        return null;
      }
{code}

> Do you think this safe Jim in below?
{code}
@@ -564,9 +566,10 @@
       //       the messages we've received. In this case, a close could be
       //       processed before an open resulting in the master not agreeing on
       //       the region's state.
+      master.regionManager.setClosed(region.getRegionName());
{code}

> Will we have the problem where state changes are processed out of
> order? Thinking on it, it doesn't seem so but asking just to check.

No, I don't think it is a problem, because the region is still in
transition and cannot be reassigned until the RegionState is removed
from the map.



> Doubly-assigned regions redux
> -----------------------------
>
>                 Key: HBASE-1104
>                 URL: https://issues.apache.org/jira/browse/HBASE-1104
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: pset cluster with TRUNK.
>            Reporter: stack
>            Assignee: Jim Kellerman
>             Fix For: 0.19.0
>
>         Attachments: 1104.patch, 1104.patch.1
>
>
> Testing, I see doubly assigned regions.  Below is from master log for 
> TestTable,0000135598,1230761605500.
> {code}
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO 
> org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: 
> TestTable,0000116170,1230761152219: TestTable,0000116170,1230761152219 split; 
> daughters: TestTable,0000116170,1230761605500, 
> TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO 
> org.apache.hadoop.hbase.master.RegionManager: assigning region 
> TestTable,0000135598,1230761605500 to server XX.XX.XX.142:60020
> 2008-12-31 22:13:38,561 [IPC Server handler 6 on 60000] INFO 
> org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: 
> TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO 
> org.apache.hadoop.hbase.master.ProcessRegionOpen$1: 
> TestTable,0000135598,1230761605500 open on XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO 
> org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row 
> TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 
> 1230759988953 and server XX.XX.XX.142:60020
> 2008-12-31 22:13:44,640 [IPC Server handler 4 on 60000] DEBUG 
> org.apache.hadoop.hbase.master.RegionManager: Going to close region 
> TestTable,0000135598,1230761605500
> 2008-12-31 22:13:50,441 [IPC Server handler 9 on 60000] INFO 
> org.apache.hadoop.hbase.master.RegionManager: assigning region 
> TestTable,0000135598,1230761605500 to server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,457 [IPC Server handler 5 on 60000] INFO 
> org.apache.hadoop.hbase.master.ServerManager: Received 
> MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from 
> XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [IPC Server handler 5 on 60000] INFO 
> org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: 
> TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO 
> org.apache.hadoop.hbase.master.ProcessRegionOpen$1: 
> TestTable,0000135598,1230761605500 open on XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO 
> org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row 
> TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 
> 1230759988788 and server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,688 [IPC Server handler 6 on 60000] INFO 
> org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_CLOSE: 
> TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:53,688 [HMaster] DEBUG 
> org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessRegionClose 
> of TestTable,0000135598,1230761605500, false
> 2008-12-31 22:13:54,263 [IPC Server handler 7 on 60000] INFO 
> org.apache.hadoop.hbase.master.RegionManager: assigning region 
> TestTable,0000135598,1230761605500 to server XX.XX.XX.141:60020
> 2008-12-31 22:13:57,273 [IPC Server handler 9 on 60000] INFO 
> org.apache.hadoop.hbase.master.ServerManager: Received 
> MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from 
> XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [IPC Server handler 0 on 60000] INFO 
> org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: 
> TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [HMaster] INFO 
> org.apache.hadoop.hbase.master.ProcessRegionOpen$1: 
> TestTable,0000135598,1230761605500 open on XX.XX.XX.141:60020
> 2008-12-31 22:14:03,918 [HMaster] INFO 
> org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row 
> TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 
> 1230759989031 and server XX.XX.XX.141:60020
> 2008-12-31 22:14:29,350 [RegionManager.metaScanner] DEBUG 
> org.apache.hadoop.hbase.master.BaseScanner: 
> TestTable,0000135598,1230761605500 no longer has references to 
> TestTable,0000116170,1230761152219
> {code}
> See how we choose to assign before we get the close back from the 
> regionserver.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1104) Doubly-assigned regions redux

Reply via email to