[
https://issues.apache.org/jira/browse/HDFS-16514?focusedWorklogId=756928&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-756928
]
ASF GitHub Bot logged work on HDFS-16514:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 14/Apr/22 11:14
Start Date: 14/Apr/22 11:14
Worklog Time Spent: 10m
Work Description: liubingxing commented on code in PR #4088:
URL: https://github.com/apache/hadoop/pull/4088#discussion_r850340871
##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryPolicies.java:
##########
@@ -639,19 +647,24 @@ public FailoverOnNetworkExceptionRetry(RetryPolicy
fallbackPolicy,
public FailoverOnNetworkExceptionRetry(RetryPolicy fallbackPolicy,
int maxFailovers, int maxRetries, long delayMillis, long maxDelayBase)
{
+ this(fallbackPolicy, maxFailovers, maxRetries, delayMillis,
maxDelayBase, 2);
+ }
+ public FailoverOnNetworkExceptionRetry(RetryPolicy fallbackPolicy,
+ int maxFailovers, int maxRetries, long delayMillis, long maxDelayBase,
int nnSize) {
this.fallbackPolicy = fallbackPolicy;
this.maxFailovers = maxFailovers;
this.maxRetries = maxRetries;
this.delayMillis = delayMillis;
this.maxDelayBase = maxDelayBase;
+ this.nnSize = nnSize;
}
/**
* @return 0 if this is our first failover/retry (i.e., retry immediately),
Review Comment:
@cndaimin Thanks for the review. I will add the comments later.
Issue Time Tracking
-------------------
Worklog Id: (was: 756928)
Time Spent: 1h 10m (was: 1h)
> Reduce the failover sleep time if multiple namenode are configured
> ------------------------------------------------------------------
>
> Key: HDFS-16514
> URL: https://issues.apache.org/jira/browse/HDFS-16514
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: qinyuren
> Priority: Major
> Labels: pull-request-available
> Attachments: image-2022-03-21-18-11-37-191.png
>
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> Recently, we used the [Standby Read] feature in our test cluster, and
> deployed 4 namenode as follow:
> node1 -> active nn
> node2 -> standby nn
> node3 -> observer nn
> node3 -> observer nn
> If we set ’dfs.client.failover.random.order=true‘, the client may failover
> twice and wait a long time to send msync to active namenode.
> !image-2022-03-21-18-11-37-191.png|width=698,height=169!
> I think we can reduce the sleep time of the first several failover based on
> the number of namenode
> For example, if 4 namenode are configured, the sleep time of first three
> failover operations is set to zero.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]