[
https://issues.apache.org/jira/browse/HDFS-15024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16986891#comment-16986891
]
huhaiyang edited comment on HDFS-15024 at 12/3/19 1:26 PM:
-----------------------------------------------------------
Current In RetryPolicies Implemented
{code:java}
/**
* @return 0 if this is our first failover/retry (i.e., retry immediately),
* sleep exponentially otherwise
*/
private long getFailoverOrRetrySleepTime(int times) {
return times == 0 ? 0 :
calculateExponentialTime(delayMillis, times, maxDelayBase);
}
{code}
It is reasonable to consider the number of namenode as a condition for
calculating sleep duration
was (Author: haiyang hu):
In RetryPolicies Implemented
/**
* @return 0 if this is our first failover/retry (i.e., retry immediately),
* sleep exponentially otherwise
*/
private long getFailoverOrRetrySleepTime(int times) {
return times == 0 ? 0 :
calculateExponentialTime(delayMillis, times, maxDelayBase);
}
> [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a
> condition of calculation of sleep time
> ---------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-15024
> URL: https://issues.apache.org/jira/browse/HDFS-15024
> Project: Hadoop HDFS
> Issue Type: Improvement
> Affects Versions: 2.10.0, 3.3.0, 3.2.1
> Reporter: huhaiyang
> Priority: Major
> Attachments: HDFS-15024.001.patch, client_error.log
>
>
> When we enable the ONN , there will be three NN nodes for the client
> configuration,
> Such as configuration
> <property>
> <name>dfs.ha.namenodes.ns1</name>
> <value>nn2,nn3,nn1</value>
> </property>
> Currently,
> nn2 is in standby state
> nn3 is in observer state
> nn1 is in active state
> When the user performs an access HDFS operation
> ./bin/hadoop --loglevel debug fs
> -Ddfs.client.failover.proxy.provider.ns1=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider
> -mkdir /user/haiyang1/test8
> You need to request nn1 when you execute the msync method,
> Actually connect nn2 first and failover is required
> In connection nn3 does not meet the requirements, failover needs to be
> performed, but at this time, failover operation needs to be performed during
> a period of hibernation
> Finally, it took a period of hibernation to connect the successful request to
> nn1
> In FailoverOnNetworkExceptionRetry getFailoverOrRetrySleepTime The current
> default implementation is Sleep time is calculated when more than one
> failover operation is performed
> I think that the Number of NameNodes as a condition of calculation of sleep
> time is more reasonable
> That is, in the current test, executing failover on connection nn3 does not
> need to sleep time to directly connect to the next nn node
> See client_error.log for details
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]