Sammi Chen created HDDS-5688:
--------------------------------

             Summary: Rpc should not retry if the exception is 
ContainerNotFoundException
                 Key: HDDS-5688
                 URL: https://issues.apache.org/jira/browse/HDDS-5688
             Project: Apache Ozone
          Issue Type: Bug
            Reporter: Sammi Chen
            Assignee: Sammi Chen


SCM HA is enabled. When run the "ozone admin container info" with non existed 
container ID, the command will retry many times before stop.   Here is the 
first three retry output, 

Hadoop UGI authentication : TAUTH
com.google.protobuf.ServiceException: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdds.ratis.ServerNotLeaderException):
 Server:7aac262f-5828-448d-a1aa-cd8a3e344b4b is not the leader. Suggested 
leader is Server:qy-ozone-common-v1-scm-1.tencent-distribute.com:9860.
        at 
org.apache.hadoop.hdds.ratis.ServerNotLeaderException.convertToNotLeaderException(ServerNotLeaderException.java:106)
        at 
org.apache.hadoop.hdds.scm.ha.RatisUtil.checkRatisException(RatisUtil.java:191)
        at 
org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.submitRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:150)
        at 
org.apache.hadoop.hdds.protocol.proto.StorageContainerLocationProtocolProtos$StorageContainerLocationProtocolService$2.callBlockingMethod(StorageContainerLocationProtocolProtos.java:48216)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1024)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:948)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2002)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2993)
, while invoking $Proxy19.submitRequest over 
nodeId=scm2,nodeAddress=qy-ozone-common-v1-scm-2.tencent-distribute.com/11.32.183.209:9860
 after 1 failover attempts. Trying to failover after sleeping for 2000ms.
com.google.protobuf.ServiceException: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdds.scm.container.ContainerNotFoundException):
 ID #481
        at 
org.apache.hadoop.hdds.scm.container.ContainerManagerImpl.lambda$getContainer$0(ContainerManagerImpl.java:147)
        at java.util.Optional.orElseThrow(Optional.java:290)
        at 
org.apache.hadoop.hdds.scm.container.ContainerManagerImpl.getContainer(ContainerManagerImpl.java:147)
        at 
org.apache.hadoop.hdds.scm.server.SCMClientProtocolServer.getContainerWithPipelineCommon(SCMClientProtocolServer.java:236)
        at 
org.apache.hadoop.hdds.scm.server.SCMClientProtocolServer.getContainerWithPipeline(SCMClientProtocolServer.java:275)
        at 
org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.getContainerWithPipeline(StorageContainerLocationProtocolServerSideTranslatorPB.java:396)
        at 
org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.processRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:189)
        at 
org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
        at 
org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.submitRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:155)
        at 
org.apache.hadoop.hdds.protocol.proto.StorageContainerLocationProtocolProtos$StorageContainerLocationProtocolService$2.callBlockingMethod(StorageContainerLocationProtocolProtos.java:48216)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1024)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:948)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2002)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2993)
, while invoking $Proxy19.submitRequest over 
nodeId=scm1,nodeAddress=qy-ozone-common-v1-scm-1.tencent-distribute.com/11.32.205.14:9860
 after 2 failover attempts. Trying to failover after sleeping for 2000ms.
com.google.protobuf.ServiceException: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdds.ratis.ServerNotLeaderException):
 Server:9e77f811-8df6-4a59-9642-0f40d6f01764 is not the leader. Suggested 
leader is Server:qy-ozone-common-v1-scm-1.tencent-distribute.com:9860.
        at 
org.apache.hadoop.hdds.ratis.ServerNotLeaderException.convertToNotLeaderException(ServerNotLeaderException.java:106)
        at 
org.apache.hadoop.hdds.scm.ha.RatisUtil.checkRatisException(RatisUtil.java:191)
        at 
org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.submitRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:150)
        at 
org.apache.hadoop.hdds.protocol.proto.StorageContainerLocationProtocolProtos$StorageContainerLocationProtocolService$2.callBlockingMethod(StorageContainerLocationProtocolProtos.java:48216)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1024)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:948)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2002)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2993)
, while invoking $Proxy19.submitRequest over 
nodeId=scm3,nodeAddress=qy-ozone-common-v1-scm-3.tencent-distribute.com/11.0.119.77:9860
 after 3 failover attempts. Trying to failover after sleeping for 2000ms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to