[
https://issues.apache.org/jira/browse/HDDS-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sammi Chen resolved HDDS-5688.
------------------------------
Resolution: Fixed
> Rpc should not retry if exception is ContainerNotFoundException
> ---------------------------------------------------------------
>
> Key: HDDS-5688
> URL: https://issues.apache.org/jira/browse/HDDS-5688
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Sammi Chen
> Assignee: Sammi Chen
> Priority: Major
> Labels: pull-request-available
>
> SCM HA is enabled. When run the "ozone admin container info" with non existed
> container ID, the command will retry many times before stop. Here is the
> first three retry output,
> Hadoop UGI authentication : TAUTH
> com.google.protobuf.ServiceException:
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdds.ratis.ServerNotLeaderException):
> Server:7aac262f-5828-448d-a1aa-cd8a3e344b4b is not the leader. Suggested
> leader is Server:qy-ozone-common-v1-scm-1.tencent-distribute.com:9860.
> at
> org.apache.hadoop.hdds.ratis.ServerNotLeaderException.convertToNotLeaderException(ServerNotLeaderException.java:106)
> at
> org.apache.hadoop.hdds.scm.ha.RatisUtil.checkRatisException(RatisUtil.java:191)
> at
> org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.submitRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:150)
> at
> org.apache.hadoop.hdds.protocol.proto.StorageContainerLocationProtocolProtos$StorageContainerLocationProtocolService$2.callBlockingMethod(StorageContainerLocationProtocolProtos.java:48216)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1024)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:948)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2002)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2993)
> , while invoking $Proxy19.submitRequest over
> nodeId=scm2,nodeAddress=qy-ozone-common-v1-scm-2.tencent-distribute.com/11.32.183.209:9860
> after 1 failover attempts. Trying to failover after sleeping for 2000ms.
> com.google.protobuf.ServiceException:
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdds.scm.container.ContainerNotFoundException):
> ID #481
> at
> org.apache.hadoop.hdds.scm.container.ContainerManagerImpl.lambda$getContainer$0(ContainerManagerImpl.java:147)
> at java.util.Optional.orElseThrow(Optional.java:290)
> at
> org.apache.hadoop.hdds.scm.container.ContainerManagerImpl.getContainer(ContainerManagerImpl.java:147)
> at
> org.apache.hadoop.hdds.scm.server.SCMClientProtocolServer.getContainerWithPipelineCommon(SCMClientProtocolServer.java:236)
> at
> org.apache.hadoop.hdds.scm.server.SCMClientProtocolServer.getContainerWithPipeline(SCMClientProtocolServer.java:275)
> at
> org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.getContainerWithPipeline(StorageContainerLocationProtocolServerSideTranslatorPB.java:396)
> at
> org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.processRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:189)
> at
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
> at
> org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.submitRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:155)
> at
> org.apache.hadoop.hdds.protocol.proto.StorageContainerLocationProtocolProtos$StorageContainerLocationProtocolService$2.callBlockingMethod(StorageContainerLocationProtocolProtos.java:48216)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1024)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:948)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2002)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2993)
> , while invoking $Proxy19.submitRequest over
> nodeId=scm1,nodeAddress=qy-ozone-common-v1-scm-1.tencent-distribute.com/11.32.205.14:9860
> after 2 failover attempts. Trying to failover after sleeping for 2000ms.
> com.google.protobuf.ServiceException:
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdds.ratis.ServerNotLeaderException):
> Server:9e77f811-8df6-4a59-9642-0f40d6f01764 is not the leader. Suggested
> leader is Server:qy-ozone-common-v1-scm-1.tencent-distribute.com:9860.
> at
> org.apache.hadoop.hdds.ratis.ServerNotLeaderException.convertToNotLeaderException(ServerNotLeaderException.java:106)
> at
> org.apache.hadoop.hdds.scm.ha.RatisUtil.checkRatisException(RatisUtil.java:191)
> at
> org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.submitRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:150)
> at
> org.apache.hadoop.hdds.protocol.proto.StorageContainerLocationProtocolProtos$StorageContainerLocationProtocolService$2.callBlockingMethod(StorageContainerLocationProtocolProtos.java:48216)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1024)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:948)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2002)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2993)
> , while invoking $Proxy19.submitRequest over
> nodeId=scm3,nodeAddress=qy-ozone-common-v1-scm-3.tencent-distribute.com/11.0.119.77:9860
> after 3 failover attempts. Trying to failover after sleeping for 2000ms.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]