[ 
https://issues.apache.org/jira/browse/HDFS-15024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16984772#comment-16984772
 ] 

huhaiyang commented on HDFS-15024:
----------------------------------

./bin/hadoop --loglevel debug fs 
-Ddfs.client.failover.proxy.provider.ns1=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider
  -mkdir /user/haiyang1/test8
...
19/11/29 14:26:55 DEBUG ipc.Client: The ping interval is 60000 ms.
19/11/29 14:26:55 DEBUG ipc.Client: Connecting to nn2/xx:8020
19/11/29 14:26:55 DEBUG ipc.Client: IPC Client (1337335626) connection to 
nn2/xx:8020 from hadoop: starting, having connections 1
19/11/29 14:26:55 DEBUG ipc.Client: IPC Client (1337335626) connection to 
nn2/xx:8020 from hadoop sending #0 
org.apache.hadoop.hdfs.protocol.ClientProtocol.msync
19/11/29 14:26:55 DEBUG ipc.Client: IPC Client (1337335626) connection to 
nn2/xx:8020 from hadoop got value #0
19/11/29 14:26:55 DEBUG retry.RetryInvocationHandler: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): 
Operation category WRITE is not supported in state standby. Visit 
https://s.apache.org/sbnn-error
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2018)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1461)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.msync(NameNodeRpcServer.java:1384)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.msync(ClientNamenodeProtocolServerSideTranslatorPB.java:1907)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:531)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:863)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1903)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2815)
, while invoking $Proxy4.getFileInfo over 
[nn3/xx:8020,nn2/xx:8020,nn1/xx:8020]. Trying to failover immediately.
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): 
Operation category WRITE is not supported in state standby. Visit 
https://s.apache.org/sbnn-error
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2018)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1461)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.msync(NameNodeRpcServer.java:1384)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.msync(ClientNamenodeProtocolServerSideTranslatorPB.java:1907)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:531)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:863)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1903)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2815)

        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1543)
        at org.apache.hadoop.ipc.Client.call(Client.java:1489)
        at org.apache.hadoop.ipc.Client.call(Client.java:1388)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
        at com.sun.proxy.$Proxy15.msync(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.msync(ClientNamenodeProtocolTranslatorPB.java:1958)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider.initializeMsync(ObserverReadProxyProvider.java:318)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider.access$500(ObserverReadProxyProvider.java:69)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider$ObserverReadInvocationHandler.invoke(ObserverReadProxyProvider.java:374)
        at com.sun.proxy.$Proxy4.getFileInfo(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
        at com.sun.proxy.$Proxy4.getFileInfo(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1666)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1582)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1579)
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1594)
        at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:65)
        at org.apache.hadoop.fs.Globber.doGlob(Globber.java:281)
        at org.apache.hadoop.fs.Globber.glob(Globber.java:149)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:2092)
        at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:353)
        at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:250)
        at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:233)
        at 
org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:106)
        at org.apache.hadoop.fs.shell.Command.run(Command.java:177)
        at org.apache.hadoop.fs.FsShell.run(FsShell.java:327)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
        at org.apache.hadoop.fs.FsShell.main(FsShell.java:390)
19/11/29 14:26:55 DEBUG retry.RetryUtils: multipleLinearRandomRetry = null
19/11/29 14:26:55 DEBUG ipc.Client: getting client out of cache: 
org.apache.hadoop.ipc.Client@378542de
19/11/29 14:26:55 DEBUG ipc.Client: The ping interval is 60000 ms.
19/11/29 14:26:55 DEBUG ipc.Client: Connecting to nn3/xx
19/11/29 14:26:55 DEBUG ipc.Client: IPC Client (1337335626) connection to 
nn3/xx:8020 from hadoop: starting, having connections 2
19/11/29 14:26:55 DEBUG ipc.Client: IPC Client (1337335626) connection to 
nn3/xx:8020 from hadoop sending #0 
org.apache.hadoop.hdfs.protocol.ClientProtocol.msync
19/11/29 14:26:55 DEBUG ipc.Client: IPC Client (1337335626) connection to 
nn3/xx:8020 from hadoop got value #0
19/11/29 14:26:55 INFO retry.RetryInvocationHandler: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): 
Operation category WRITE is not supported in state observer. Visit 
https://s.apache.org/sbnn-error
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2018)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1461)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.msync(NameNodeRpcServer.java:1384)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.msync(ClientNamenodeProtocolServerSideTranslatorPB.java:1907)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:531)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:863)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1903)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2815)
, while invoking $Proxy4.getFileInfo over [nn3/xx:8020,nn2/xx:8020,nn1/xx:8020] 
after 1 failover attempts. Trying to failover after sleeping for 1172ms.
19/11/29 14:26:56 DEBUG retry.RetryUtils: multipleLinearRandomRetry = null
19/11/29 14:26:56 DEBUG ipc.Client: getting client out of cache: 
org.apache.hadoop.ipc.Client@378542de
19/11/29 14:26:56 DEBUG ipc.Client: The ping interval is 60000 ms.
19/11/29 14:26:56 DEBUG ipc.Client: Connecting to nn1/xx:8020
19/11/29 14:26:56 DEBUG ipc.Client: IPC Client (1337335626) connection to 
nn1/xx:8020 from hadoop: starting, having connections 3
19/11/29 14:26:56 DEBUG ipc.Client: IPC Client (1337335626) connection to 
nn1/xx:8020 from hadoop sending #0 
org.apache.hadoop.hdfs.protocol.ClientProtocol.msync
19/11/29 14:26:56 DEBUG ipc.Client: IPC Client (1337335626) connection to 
nn1/xx:8020 from hadoop got value #0
19/11/29 14:26:56 DEBUG ipc.ProtobufRpcEngine: Call: msync took 2ms
19/11/29 14:26:56 DEBUG retry.RetryUtils: multipleLinearRandomRetry = null
...
19/11/29 14:26:56 DEBUG ipc.ProtobufRpcEngine: Call: mkdirs took 25ms

> [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a 
> condition of calculation of sleep time
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-15024
>                 URL: https://issues.apache.org/jira/browse/HDFS-15024
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.10.0, 3.3.0, 3.2.1
>            Reporter: huhaiyang
>            Priority: Major
>         Attachments: HDFS-15024.001.patch, client_error.log
>
>
> {code:java}
> When we enable the ONN , there will be three NN nodes for the client 
> configuration,
> Such as configuration
> <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn2,nn3,nn1</value>
> </property>
> Currently, 
> nn2 is in standby state
> nn3 is in observer state 
> nn1 is in active state
> When the user performs an access HDFS operation
> ./bin/hadoop --loglevel debug fs 
> -Ddfs.client.failover.proxy.provider.ns1=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider
>  -mkdir /user/haiyang1/test8
> You need to request nn1 when you execute the msync method,
> Actually connect nn2 first and failover is required
> In connection nn3 does not meet the requirements, failover needs to be 
> performed, but at this time, failover operation needs to be performed during 
> a period of hibernation
> Finally, it took a period of hibernation to connect the successful request to 
> nn1
> In FailoverOnNetworkExceptionRetry getFailoverOrRetrySleepTime The current 
> default implementation is Sleep time is calculated when more than one 
> failover operation is performed
> I think that the Number of NameNodes as a condition of calculation of sleep 
> time is more reasonable
> That is, in the current test, executing failover on connection nn3 does not 
> need to sleep time to directly connect to the next nn node
> See client_error.log for details
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to