[jira] [Commented] (HBASE-8099) ReplicationZookeeper.copyQueuesFromRSUsingMulti should not return any queues if it failed to execute.

Lars Hofhansl (JIRA) Wed, 13 Mar 2013 21:08:15 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-8099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13602012#comment-13602012
 ]


Lars Hofhansl commented on HBASE-8099:
--------------------------------------

That works. Personally I'd probably just return queues in the first case and do 
a clear() for the second like this:
{code}
-      if (peerIdsToProcess == null) return null; // node already processed
+      if (peerIdsToProcess == null) return queues; // node already processed
...
       LOG.warn("Got exception in copyQueuesFromRSUsingMulti: ", e);
+      queues.clear();
{code}

Maybe while we're add it, we could add a random jitter to the failover.
Add a Random member to ReplicationSourceManager and than do this in 
NodeFailoverWorker:
{code}
-        Thread.sleep(sleepBeforeFailover);
+        Thread.sleep(sleepBeforeFailover + 
(long)(random.nextFloat()*sleepBeforeFailover));
{code}

                
> ReplicationZookeeper.copyQueuesFromRSUsingMulti should not return any queues 
> if it failed to execute.
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-8099
>                 URL: https://issues.apache.org/jira/browse/HBASE-8099
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Himanshu Vashishtha
>            Priority: Blocker
>             Fix For: 0.94.7
>
>         Attachments: HBase-8099-94.patch, HBase-8099-94-v2.patch, 
> HBase-8099-trunk-2.patch, HBase-8099-trunk.patch
>
>
> We just ran into an interesting scenario. We restarted a cluster that was 
> setup as a replication source.
> The stop went cleanly.
> Upon restart *all* regionservers aborted within a few seconds with variations 
> of these errors:
> http://pastebin.com/3iQVuBqS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8099) ReplicationZookeeper.copyQueuesFromRSUsingMulti should not return any queues if it failed to execute.

Reply via email to