[ 
https://issues.apache.org/jira/browse/HDFS-15024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16986882#comment-16986882
 ] 

huhaiyang edited comment on HDFS-15024 at 12/3/19 1:11 PM:
-----------------------------------------------------------

[~xkrogen] [~csun] [~vagarychen] Thanks for your comments!
I understand Normally, if we only set 2 NNS, 
<property>
<name>dfs.ha.namenodes.ns1</name>
<value>nn1,nn2</value>
</property>

Currently,
nn1 is in active state
nn2 is in standby state

when the client connects to nn2, it needs to retry, and will quickly connect to 
nn1. However, when the nn1 fails to connect due to network problems, the next 
time( the third time), the sleep timeout will be performed for a period of time 
to retry

Current HDFS-6440 Support more than 2 NameNodes.
if we set 3 NNS, 
<property>
<name>dfs.ha.namenodes.ns1</name>
<value>nn1,nn2,nn3</value>
</property>
nn1 is in active state
nn2 is in standby state
nn3 is in standby state(or observer state)

when the client connects to nn2, it needs to retry, and will quickly connect to 
nn3. 
and  the client connects to nn3, it needs to retry, and will quickly connect to 
nn1.
However, when the nn1 fails to connect due to network problems, the next time( 
the fourth time), the sleep timeout will be performed for a period of time to 
retry.

That is to say, it is necessary to connect all the configured NN nodes once.
If no NN nodes the requirements are found, required to perform sleep and 
retry...

In FailoverOnNetworkExceptionRetry getFailoverOrRetrySleepTime , I think that 
the Number of NameNodes as a condition of calculation of sleep time is more 
reasonable((which is current v01 patch)).



was (Author: haiyang hu):
[~xkrogen][~csun][~vagarychen] Thanks for your comments!
I understand Normally, if we only set 2 NNS, 
<property>
<name>dfs.ha.namenodes.ns1</name>
<value>nn1,nn2</value>
</property>

Currently,
nn1 is in active state
nn2 is in standby state

when the client connects to nn2, it needs to retry, and will quickly connect to 
nn1. However, when the nn1 fails to connect due to network problems, the next 
time( the third time), the sleep timeout will be performed for a period of time 
to retry

Current HDFS-6440 Support more than 2 NameNodes.
if we set 3 NNS, 
<property>
<name>dfs.ha.namenodes.ns1</name>
<value>nn1,nn2,nn3</value>
</property>
nn1 is in active state
nn2 is in standby state
nn3 is in standby state(or observer state)

when the client connects to nn2, it needs to retry, and will quickly connect to 
nn3. 
and  the client connects to nn3, it needs to retry, and will quickly connect to 
nn1.
However, when the nn1 fails to connect due to network problems, the next time( 
the fourth time), the sleep timeout will be performed for a period of time to 
retry.

That is to say, it is necessary to connect all the configured NN nodes once.
If no NN nodes the requirements are found, required to perform sleep and 
retry...

In FailoverOnNetworkExceptionRetry getFailoverOrRetrySleepTime , I think that 
the Number of NameNodes as a condition of calculation of sleep time is more 
reasonable((which is current v01 patch)).


> [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a 
> condition of calculation of sleep time
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-15024
>                 URL: https://issues.apache.org/jira/browse/HDFS-15024
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.10.0, 3.3.0, 3.2.1
>            Reporter: huhaiyang
>            Priority: Major
>         Attachments: HDFS-15024.001.patch, client_error.log
>
>
> When we enable the ONN , there will be three NN nodes for the client 
> configuration,
> Such as configuration
> <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn2,nn3,nn1</value>
> </property>
> Currently, 
> nn2 is in standby state
> nn3 is in observer state 
> nn1 is in active state
> When the user performs an access HDFS operation
> ./bin/hadoop --loglevel debug fs 
> -Ddfs.client.failover.proxy.provider.ns1=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider
>  -mkdir /user/haiyang1/test8
> You need to request nn1 when you execute the msync method,
> Actually connect nn2 first and failover is required
> In connection nn3 does not meet the requirements, failover needs to be 
> performed, but at this time, failover operation needs to be performed during 
> a period of hibernation
> Finally, it took a period of hibernation to connect the successful request to 
> nn1
> In FailoverOnNetworkExceptionRetry getFailoverOrRetrySleepTime The current 
> default implementation is Sleep time is calculated when more than one 
> failover operation is performed
> I think that the Number of NameNodes as a condition of calculation of sleep 
> time is more reasonable
> That is, in the current test, executing failover on connection nn3 does not 
> need to sleep time to directly connect to the next nn node
> See client_error.log for details



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to