[ 
https://issues.apache.org/jira/browse/HBASE-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12974302#action_12974302
 ] 

stack commented on HBASE-3381:
------------------------------

Sorry, the above stack trace IS evidence that latest spin on this patch is 
working (Its below).  We were stuck in CatalogManager waiting on .META. to come 
back and it was going on too long so the worker thread was interrupted... and 
the open of the region closed up (See above the 'Interrupting thread' message 
and then the stack trace is actually out of the worker thread named 
PostOpenDeployTasksThread).

Here is patch I'm committing:

{code}
diff --git 
a/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
 
b/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
index d3c78e1..28bdfb9 100644
--- 
a/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
+++ 
b/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
@@ -160,8 +160,18 @@ public class OpenRegionHandler extends EventHandler {
     // Is thread still alive?  We may have left above loop because server is
     // stopping or we timed out the edit.  Is so, interrupt it.
     if (t.isAlive()) {
-      LOG.debug("Interrupting thread " + t);
-      t.interrupt();
+      if (!signaller.get()) {
+        // Thread still running; interrupt
+        LOG.debug("Interrupting thread " + t);
+        t.interrupt();
+      }
+      try {
+        t.join();
+      } catch (InterruptedException ie) {
+        LOG.warn("Interrupted joining " +
+          r.getRegionInfo().getRegionNameAsString(), ie);
+        Thread.currentThread().interrupt();
+      }
     }
     // Was there an exception opening the region?  This should trigger on
     // InterruptedException too.  If so, we failed.
{code}

Only interrupt if our signaller has NOT been set (it will not be set if we are 
stuck trying to update meta).  Then join on the thread so we have chance to 
pick up any exceptions... so we return the right result out of this method.

> Interrupt of a region open comes across as a successful open
> ------------------------------------------------------------
>
>                 Key: HBASE-3381
>                 URL: https://issues.apache.org/jira/browse/HBASE-3381
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>             Fix For: 0.90.0
>
>         Attachments: 3381.txt
>
>
> Meta was offline when below happened:
> {code}
> 2010-12-21 19:45:23,023 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x12d0a53c540000e Attempting to transition node 
> 337038b50e467fbd6b031f278bbd9c22 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENING
> 2010-12-21 19:45:23,046 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x12d0a53c540000e Successfully transitioned node 
> 337038b50e467fbd6b031f278bbd9c22 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENING
> 2010-12-21 19:45:26,379 DEBUG 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Interrupting 
> thread Thread[PostOpenDeployTasks:337038b50e467fbd6b031f278bbd9c22,5,main]
> 2010-12-21 19:45:26,379 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x12d0a53c540000e Attempting to transition node 
> 337038b50e467fbd6b031f278bbd9c22 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENED
> 2010-12-21 19:45:26,381 WARN 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Exception 
> running postOpenDeployTasks; region=337038b50e467fbd6b031f278bbd9c22
> org.apache.hadoop.hbase.NotAllMetaRegionsOnlineException: Interrupted
>     at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:364)
>     at 
> org.apache.hadoop.hbase.catalog.MetaEditor.updateRegionLocation(MetaEditor.java:146)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.postOpenDeployTasks(HRegionServer.java:1331)
>     at 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler$PostOpenDeployTasksThread.run(OpenRegionHandler.java:195)
> ...
> {code}
> So, we timed out trying to open the region but rather than close the region 
> because edit failed, we missed seeing the InterruptedException.
> Here is suggested fix:
> {code}
> diff --git a/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 
> b/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
> index 7bf680d..2b0078c 100644
> --- a/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
> +++ b/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
> @@ -339,7 +339,7 @@ public class MetaReader {
>      get.addFamily(HConstants.CATALOG_FAMILY);
>      byte [] meta = getCatalogRegionNameForRegion(regionName);
>      Result r = catalogTracker.waitForMetaServerConnectionDefault().get(meta, 
> get);
> -    if(r == null || r.isEmpty()) {
> +    if (r == null || r.isEmpty()) {
>        return null;
>      }
>      return metaRowToRegionPair(r);
> {code}
> Let me try it.
> W/o this, what we see is hbck showing that region is on server X but in 
> .META. it shows as being on Y (its pre-balance server)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to