[jira] [Commented] (HBASE-9514) Prevent region from assigning before log splitting is done

stack (JIRA) Mon, 16 Sep 2013 15:12:14 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-9514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13768813#comment-13768813
 ]


stack commented on HBASE-9514:
------------------------------

Here you are adding back the random region move to try and bring on the issue 
again:

-        new FlushRandomRegionOfTableAction(tableName)
+        new FlushRandomRegionOfTableAction(tableName),
+        new MoveRandomRegionOfTableAction(tableName)

Why we need this?

-    this.maximumAttempts =
-      
this.server.getConfiguration().getInt("hbase.assignment.maximum.attempts", 10);
+    this.maximumAttempts = Math.max(1,
+      
this.server.getConfiguration().getInt("hbase.assignment.maximum.attempts", 10));

It could be configured zero?  You saying try at least once?

I suppose +  public Lock acquireLock(final String encodedName) { has to be 
public because SSH wants to use it too?

How long do servers hang out in dead servers?

{code}
+        if (!region.isMetaRegion() &&
+            regionStates.wasRegionOnDeadServer(encodedName)) {
+          LOG.info("Skip assigning " + region.getRegionNameAsString()
+            + " because it's host " + 
regionStates.getLastRegionServerOfRegion(encodedName)
+            + " is dead but not processed");
+          // Make sure the region is offline so that SSH will assign it.
+          // Need to make sure we don't race with SSH.
+          regionOffline(region);
+          return;
+        }
{code}

I suppose it doesn't matter if in dead server for a long time since each server 
has a startcode?

Does this big block of new code have to go into the middle of assign?  Can it 
be broken up a little into methods that are easier to grok?

{code}
+        if (serverManager.isServerOnline(server) &&
+            (t instanceof java.net.SocketTimeoutException ||
+                t instanceof java.net.ConnectException)) {
{code}

Is it a good idea inserting this wait here for every exception?  What if the 
exception is a NSRE?  Doesn't NSRE indicate live server?

The big change in the middle I cannot follow.  Can we have a note on what it 
does?

Do declare and assign in one go I'd say:

+    lastAssignments = new HashMap<String, ServerName>();

I like this map in RS.

Good stuff Jimmy
                
> Prevent region from assigning before log splitting is done
> ----------------------------------------------------------
>
>                 Key: HBASE-9514
>                 URL: https://issues.apache.org/jira/browse/HBASE-9514
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>            Reporter: Jimmy Xiang
>            Assignee: Jimmy Xiang
>            Priority: Blocker
>         Attachments: trunk-9514_v1.patch
>
>
> If a region is assigned before log splitting is done by the server shutdown 
> handler, the edits belonging to this region in the hlogs of the dead server 
> will be lost.
> Generally this is not an issue if users don't assign/unassign a region from 
> hbase shell or via hbase admin. These commands are marked for experts only in 
> the hbase shell help too.  However, chaos monkey doesn't care.
> If we can prevent from assigning such regions in a bad time, it would make 
> things a little safer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-9514) Prevent region from assigning before log splitting is done

Reply via email to