Sammi Chen created HDDS-5688:
--------------------------------
Summary: Rpc should not retry if the exception is
ContainerNotFoundException
Key: HDDS-5688
URL: https://issues.apache.org/jira/browse/HDDS-5688
Project: Apache Ozone
Issue Type: Bug
Reporter: Sammi Chen
Assignee: Sammi Chen
SCM HA is enabled. When run the "ozone admin container info" with non existed
container ID, the command will retry many times before stop. Here is the
first three retry output,
Hadoop UGI authentication : TAUTH
com.google.protobuf.ServiceException:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdds.ratis.ServerNotLeaderException):
Server:7aac262f-5828-448d-a1aa-cd8a3e344b4b is not the leader. Suggested
leader is Server:qy-ozone-common-v1-scm-1.tencent-distribute.com:9860.
at
org.apache.hadoop.hdds.ratis.ServerNotLeaderException.convertToNotLeaderException(ServerNotLeaderException.java:106)
at
org.apache.hadoop.hdds.scm.ha.RatisUtil.checkRatisException(RatisUtil.java:191)
at
org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.submitRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:150)
at
org.apache.hadoop.hdds.protocol.proto.StorageContainerLocationProtocolProtos$StorageContainerLocationProtocolService$2.callBlockingMethod(StorageContainerLocationProtocolProtos.java:48216)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1024)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:948)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2002)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2993)
, while invoking $Proxy19.submitRequest over
nodeId=scm2,nodeAddress=qy-ozone-common-v1-scm-2.tencent-distribute.com/11.32.183.209:9860
after 1 failover attempts. Trying to failover after sleeping for 2000ms.
com.google.protobuf.ServiceException:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdds.scm.container.ContainerNotFoundException):
ID #481
at
org.apache.hadoop.hdds.scm.container.ContainerManagerImpl.lambda$getContainer$0(ContainerManagerImpl.java:147)
at java.util.Optional.orElseThrow(Optional.java:290)
at
org.apache.hadoop.hdds.scm.container.ContainerManagerImpl.getContainer(ContainerManagerImpl.java:147)
at
org.apache.hadoop.hdds.scm.server.SCMClientProtocolServer.getContainerWithPipelineCommon(SCMClientProtocolServer.java:236)
at
org.apache.hadoop.hdds.scm.server.SCMClientProtocolServer.getContainerWithPipeline(SCMClientProtocolServer.java:275)
at
org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.getContainerWithPipeline(StorageContainerLocationProtocolServerSideTranslatorPB.java:396)
at
org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.processRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:189)
at
org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
at
org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.submitRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:155)
at
org.apache.hadoop.hdds.protocol.proto.StorageContainerLocationProtocolProtos$StorageContainerLocationProtocolService$2.callBlockingMethod(StorageContainerLocationProtocolProtos.java:48216)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1024)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:948)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2002)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2993)
, while invoking $Proxy19.submitRequest over
nodeId=scm1,nodeAddress=qy-ozone-common-v1-scm-1.tencent-distribute.com/11.32.205.14:9860
after 2 failover attempts. Trying to failover after sleeping for 2000ms.
com.google.protobuf.ServiceException:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdds.ratis.ServerNotLeaderException):
Server:9e77f811-8df6-4a59-9642-0f40d6f01764 is not the leader. Suggested
leader is Server:qy-ozone-common-v1-scm-1.tencent-distribute.com:9860.
at
org.apache.hadoop.hdds.ratis.ServerNotLeaderException.convertToNotLeaderException(ServerNotLeaderException.java:106)
at
org.apache.hadoop.hdds.scm.ha.RatisUtil.checkRatisException(RatisUtil.java:191)
at
org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.submitRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:150)
at
org.apache.hadoop.hdds.protocol.proto.StorageContainerLocationProtocolProtos$StorageContainerLocationProtocolService$2.callBlockingMethod(StorageContainerLocationProtocolProtos.java:48216)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1024)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:948)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2002)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2993)
, while invoking $Proxy19.submitRequest over
nodeId=scm3,nodeAddress=qy-ozone-common-v1-scm-3.tencent-distribute.com/11.0.119.77:9860
after 3 failover attempts. Trying to failover after sleeping for 2000ms.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]