Jyotirmoy Sinha created HDDS-12634:
--------------------------------------

             Summary: All read/write operations are failing with 
OMNotLeaderException
                 Key: HDDS-12634
                 URL: https://issues.apache.org/jira/browse/HDDS-12634
             Project: Apache Ozone
          Issue Type: Bug
          Components: OM
            Reporter: Jyotirmoy Sinha


Scenario -
 # Current load -

 * 
 ** No. of keys - 25750384
 ** No. of pending key deletions - 94222
 ** No. of buckets - 1001
 ** No. of volumes - 251
 ** Data usage - 1.2 TB/52.7 TB
 ** Snapshot metrics -

{code:java}
    "NumSnapshotCreates" : 4685,
    "NumSnapshotDeletes" : 2182,
    "NumSnapshotLists" : 9364,
    "NumSnapshotPurges" : 1228,
    "NumSnapshotCreateFails" : 0,
    "NumSnapshotDeleteFails" : 0,
    "NumSnapshotListFails" : 0,
    "NumSnapshotPurgeFails" : 0,
    "NumSnapshotActive" : 2503,
    "NumSnapshotDeleted" : 2182,
    "NumSnapshotReclaimed" : 0, {code}
 # In the starting of load 1 OM was shut down and was kept in that state for 2+ 
days. 2 OMs are only up
 # When the above OM was started, 1 of the other OM was shutdown. 2 OMs are 
only up.
 # The above OM was kept shutdown for 3+ days. From stage 3-4 all read/write 
operations were failing with OMLeaderNotReadyException.
 # After all the OMs were started, the read-write operations restarted again. 
 # The read-write operations were in downtime for 2 days, even when 2 OMs were 
up in functioning state.

Error - 
{code:java}
# ozone sh volume create vol1
25/03/17 20:41:28 INFO retry.RetryInvocationHandler: 
com.google.protobuf.ServiceException: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.om.exceptions.OMLeaderNotReadyException):
 om122 is Leader but not ready to process request yet.
    at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.createLeaderNotReadyException(OzoneManagerProtocolServerSideTranslatorPB.java:239)
    at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.createLeaderErrorException(OzoneManagerProtocolServerSideTranslatorPB.java:231)
    at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitReadRequestToOM(OzoneManagerProtocolServerSideTranslatorPB.java:222)
    at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:174)
    at 
org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
    at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:143)
    at 
org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
    at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
    at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994)
    at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:922)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1910)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2899)
, while invoking $Proxy18.submitRequest over 
nodeId=om122,nodeAddress=vc0120.halxg.cloudera.com:9862 after 3 failover 
attempts. Trying to failover after sleeping for 4000ms.
25/03/17 20:41:32 INFO retry.RetryInvocationHandler: 
com.google.protobuf.ServiceException: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.om.exceptions.OMLeaderNotReadyException):
 om122 is Leader but not ready to process request yet.
    at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.createLeaderNotReadyException(OzoneManagerProtocolServerSideTranslatorPB.java:239)
    at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.createLeaderErrorException(OzoneManagerProtocolServerSideTranslatorPB.java:231)
    at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitReadRequestToOM(OzoneManagerProtocolServerSideTranslatorPB.java:222)
    at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:174)
    at 
org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
    at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:143)
    at 
org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
    at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
    at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994)
    at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:922)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1910)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2899)
, while invoking $Proxy18.submitRequest over 
nodeId=om122,nodeAddress=vc0120.halxg.cloudera.com:9862 after 4 failover 
attempts. Trying to failover after sleeping for 6000ms. {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to