[
https://issues.apache.org/jira/browse/HDDS-12634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Krishna Kumar Asawa reassigned HDDS-12634:
------------------------------------------
Assignee: Sadanand Shenoy
> All read/write operations are failing with OMNotLeaderException
> ---------------------------------------------------------------
>
> Key: HDDS-12634
> URL: https://issues.apache.org/jira/browse/HDDS-12634
> Project: Apache Ozone
> Issue Type: Bug
> Components: OM
> Reporter: Jyotirmoy Sinha
> Assignee: Sadanand Shenoy
> Priority: Major
>
> Scenario -
> # Current load -
> *
> ** No. of keys - 25750384
> ** No. of pending key deletions - 94222
> ** No. of buckets - 1001
> ** No. of volumes - 251
> ** Data usage - 1.2 TB/52.7 TB
> ** Snapshot metrics -
> {code:java}
> "NumSnapshotCreates" : 4685,
> "NumSnapshotDeletes" : 2182,
> "NumSnapshotLists" : 9364,
> "NumSnapshotPurges" : 1228,
> "NumSnapshotCreateFails" : 0,
> "NumSnapshotDeleteFails" : 0,
> "NumSnapshotListFails" : 0,
> "NumSnapshotPurgeFails" : 0,
> "NumSnapshotActive" : 2503,
> "NumSnapshotDeleted" : 2182,
> "NumSnapshotReclaimed" : 0, {code}
> # In the starting of load 1 OM was shut down and was kept in that state for
> 2+ days. 2 OMs are only up
> # When the above OM was started, 1 of the other OM was shutdown. 2 OMs are
> only up.
> # The above OM was kept shutdown for 3+ days. From stage 3-4 all read/write
> operations were failing with OMLeaderNotReadyException.
> # After all the OMs were started, the read-write operations restarted again.
> # The read-write operations were in downtime for 2 days, even when 2 OMs
> were up in functioning state.
> Error -
> {code:java}
> # ozone sh volume create vol1
> 25/03/17 20:41:28 INFO retry.RetryInvocationHandler:
> com.google.protobuf.ServiceException:
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.om.exceptions.OMLeaderNotReadyException):
> om122 is Leader but not ready to process request yet.
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.createLeaderNotReadyException(OzoneManagerProtocolServerSideTranslatorPB.java:239)
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.createLeaderErrorException(OzoneManagerProtocolServerSideTranslatorPB.java:231)
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitReadRequestToOM(OzoneManagerProtocolServerSideTranslatorPB.java:222)
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:174)
> at
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:143)
> at
> org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:922)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1910)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2899)
> , while invoking $Proxy18.submitRequest over
> nodeId=om122,nodeAddress=vc0120.halxg.cloudera.com:9862 after 3 failover
> attempts. Trying to failover after sleeping for 4000ms.
> 25/03/17 20:41:32 INFO retry.RetryInvocationHandler:
> com.google.protobuf.ServiceException:
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.om.exceptions.OMLeaderNotReadyException):
> om122 is Leader but not ready to process request yet.
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.createLeaderNotReadyException(OzoneManagerProtocolServerSideTranslatorPB.java:239)
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.createLeaderErrorException(OzoneManagerProtocolServerSideTranslatorPB.java:231)
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitReadRequestToOM(OzoneManagerProtocolServerSideTranslatorPB.java:222)
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:174)
> at
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:143)
> at
> org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:922)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1910)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2899)
> , while invoking $Proxy18.submitRequest over
> nodeId=om122,nodeAddress=vc0120.halxg.cloudera.com:9862 after 4 failover
> attempts. Trying to failover after sleeping for 6000ms. {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]