[jira] [Work logged] (HDFS-16514) Reduce the failover sleep time if multiple namenode are configured

ASF GitHub Bot (Jira) Thu, 14 Apr 2022 04:15:04 -0700


     [ 
https://issues.apache.org/jira/browse/HDFS-16514?focusedWorklogId=756928&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-756928
 ]


ASF GitHub Bot logged work on HDFS-16514:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 14/Apr/22 11:14
            Start Date: 14/Apr/22 11:14
    Worklog Time Spent: 10m 
      Work Description: liubingxing commented on code in PR #4088:
URL: https://github.com/apache/hadoop/pull/4088#discussion_r850340871


##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryPolicies.java:
##########
@@ -639,19 +647,24 @@ public FailoverOnNetworkExceptionRetry(RetryPolicy 
fallbackPolicy,
     
     public FailoverOnNetworkExceptionRetry(RetryPolicy fallbackPolicy,
         int maxFailovers, int maxRetries, long delayMillis, long maxDelayBase) 
{
+      this(fallbackPolicy, maxFailovers, maxRetries, delayMillis, 
maxDelayBase, 2);
+    }
+    public FailoverOnNetworkExceptionRetry(RetryPolicy fallbackPolicy,
+        int maxFailovers, int maxRetries, long delayMillis, long maxDelayBase, 
int nnSize) {
       this.fallbackPolicy = fallbackPolicy;
       this.maxFailovers = maxFailovers;
       this.maxRetries = maxRetries;
       this.delayMillis = delayMillis;
       this.maxDelayBase = maxDelayBase;
+      this.nnSize = nnSize;
     }
 
     /**
      * @return 0 if this is our first failover/retry (i.e., retry immediately),

Review Comment:
   @cndaimin Thanks for the review. I will add the comments later.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 756928)
    Time Spent: 1h 10m  (was: 1h)

> Reduce the failover sleep time if multiple namenode are configured
> ------------------------------------------------------------------
>
>                 Key: HDFS-16514
>                 URL: https://issues.apache.org/jira/browse/HDFS-16514
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: qinyuren
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: image-2022-03-21-18-11-37-191.png
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Recently, we used the [Standby Read] feature in our test cluster, and 
> deployed 4 namenode as follow:
> node1 -> active nn
> node2 -> standby nn
> node3 -> observer nn
> node3 -> observer nn
> If we set ’dfs.client.failover.random.order=true‘, the client may failover 
> twice and wait a long time to send msync to active namenode. 
> !image-2022-03-21-18-11-37-191.png|width=698,height=169!
> I think we can reduce the sleep time of the first several failover based on 
> the number of namenode
> For example, if 4 namenode are configured, the sleep time of first three 
> failover operations is set to zero.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Work logged] (HDFS-16514) Reduce the failover sleep time if multiple namenode are configured

Reply via email to