[jira] [Assigned] (HDDS-12634) All read/write operations are failing with OMNotLeaderException

Krishna Kumar Asawa (Jira) Wed, 19 Mar 2025 22:08:56 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-12634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Krishna Kumar Asawa reassigned HDDS-12634:
------------------------------------------

    Assignee: Sadanand Shenoy

> All read/write operations are failing with OMNotLeaderException
> ---------------------------------------------------------------
>
>                 Key: HDDS-12634
>                 URL: https://issues.apache.org/jira/browse/HDDS-12634
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: OM
>            Reporter: Jyotirmoy Sinha
>            Assignee: Sadanand Shenoy
>            Priority: Major
>
> Scenario -
>  # Current load -
>  * 
>  ** No. of keys - 25750384
>  ** No. of pending key deletions - 94222
>  ** No. of buckets - 1001
>  ** No. of volumes - 251
>  ** Data usage - 1.2 TB/52.7 TB
>  ** Snapshot metrics -
> {code:java}
>     "NumSnapshotCreates" : 4685,
>     "NumSnapshotDeletes" : 2182,
>     "NumSnapshotLists" : 9364,
>     "NumSnapshotPurges" : 1228,
>     "NumSnapshotCreateFails" : 0,
>     "NumSnapshotDeleteFails" : 0,
>     "NumSnapshotListFails" : 0,
>     "NumSnapshotPurgeFails" : 0,
>     "NumSnapshotActive" : 2503,
>     "NumSnapshotDeleted" : 2182,
>     "NumSnapshotReclaimed" : 0, {code}
>  # In the starting of load 1 OM was shut down and was kept in that state for 
> 2+ days. 2 OMs are only up
>  # When the above OM was started, 1 of the other OM was shutdown. 2 OMs are 
> only up.
>  # The above OM was kept shutdown for 3+ days. From stage 3-4 all read/write 
> operations were failing with OMLeaderNotReadyException.
>  # After all the OMs were started, the read-write operations restarted again. 
>  # The read-write operations were in downtime for 2 days, even when 2 OMs 
> were up in functioning state.
> Error - 
> {code:java}
> # ozone sh volume create vol1
> 25/03/17 20:41:28 INFO retry.RetryInvocationHandler: 
> com.google.protobuf.ServiceException: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.om.exceptions.OMLeaderNotReadyException):
>  om122 is Leader but not ready to process request yet.
>     at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.createLeaderNotReadyException(OzoneManagerProtocolServerSideTranslatorPB.java:239)
>     at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.createLeaderErrorException(OzoneManagerProtocolServerSideTranslatorPB.java:231)
>     at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitReadRequestToOM(OzoneManagerProtocolServerSideTranslatorPB.java:222)
>     at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:174)
>     at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
>     at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:143)
>     at 
> org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
>     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
>     at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994)
>     at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:922)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1910)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2899)
> , while invoking $Proxy18.submitRequest over 
> nodeId=om122,nodeAddress=vc0120.halxg.cloudera.com:9862 after 3 failover 
> attempts. Trying to failover after sleeping for 4000ms.
> 25/03/17 20:41:32 INFO retry.RetryInvocationHandler: 
> com.google.protobuf.ServiceException: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.om.exceptions.OMLeaderNotReadyException):
>  om122 is Leader but not ready to process request yet.
>     at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.createLeaderNotReadyException(OzoneManagerProtocolServerSideTranslatorPB.java:239)
>     at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.createLeaderErrorException(OzoneManagerProtocolServerSideTranslatorPB.java:231)
>     at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitReadRequestToOM(OzoneManagerProtocolServerSideTranslatorPB.java:222)
>     at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:174)
>     at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
>     at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:143)
>     at 
> org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
>     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
>     at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994)
>     at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:922)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1910)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2899)
> , while invoking $Proxy18.submitRequest over 
> nodeId=om122,nodeAddress=vc0120.halxg.cloudera.com:9862 after 4 failover 
> attempts. Trying to failover after sleeping for 6000ms. {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Assigned] (HDDS-12634) All read/write operations are failing with OMNotLeaderException

Reply via email to