Jyotirmoy Sinha created HDDS-12634:
--------------------------------------
Summary: All read/write operations are failing with
OMNotLeaderException
Key: HDDS-12634
URL: https://issues.apache.org/jira/browse/HDDS-12634
Project: Apache Ozone
Issue Type: Bug
Components: OM
Reporter: Jyotirmoy Sinha
Scenario -
# Current load -
*
** No. of keys - 25750384
** No. of pending key deletions - 94222
** No. of buckets - 1001
** No. of volumes - 251
** Data usage - 1.2 TB/52.7 TB
** Snapshot metrics -
{code:java}
"NumSnapshotCreates" : 4685,
"NumSnapshotDeletes" : 2182,
"NumSnapshotLists" : 9364,
"NumSnapshotPurges" : 1228,
"NumSnapshotCreateFails" : 0,
"NumSnapshotDeleteFails" : 0,
"NumSnapshotListFails" : 0,
"NumSnapshotPurgeFails" : 0,
"NumSnapshotActive" : 2503,
"NumSnapshotDeleted" : 2182,
"NumSnapshotReclaimed" : 0, {code}
# In the starting of load 1 OM was shut down and was kept in that state for 2+
days. 2 OMs are only up
# When the above OM was started, 1 of the other OM was shutdown. 2 OMs are
only up.
# The above OM was kept shutdown for 3+ days. From stage 3-4 all read/write
operations were failing with OMLeaderNotReadyException.
# After all the OMs were started, the read-write operations restarted again.
# The read-write operations were in downtime for 2 days, even when 2 OMs were
up in functioning state.
Error -
{code:java}
# ozone sh volume create vol1
25/03/17 20:41:28 INFO retry.RetryInvocationHandler:
com.google.protobuf.ServiceException:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.om.exceptions.OMLeaderNotReadyException):
om122 is Leader but not ready to process request yet.
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.createLeaderNotReadyException(OzoneManagerProtocolServerSideTranslatorPB.java:239)
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.createLeaderErrorException(OzoneManagerProtocolServerSideTranslatorPB.java:231)
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitReadRequestToOM(OzoneManagerProtocolServerSideTranslatorPB.java:222)
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:174)
at
org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:143)
at
org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:922)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1910)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2899)
, while invoking $Proxy18.submitRequest over
nodeId=om122,nodeAddress=vc0120.halxg.cloudera.com:9862 after 3 failover
attempts. Trying to failover after sleeping for 4000ms.
25/03/17 20:41:32 INFO retry.RetryInvocationHandler:
com.google.protobuf.ServiceException:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.om.exceptions.OMLeaderNotReadyException):
om122 is Leader but not ready to process request yet.
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.createLeaderNotReadyException(OzoneManagerProtocolServerSideTranslatorPB.java:239)
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.createLeaderErrorException(OzoneManagerProtocolServerSideTranslatorPB.java:231)
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitReadRequestToOM(OzoneManagerProtocolServerSideTranslatorPB.java:222)
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:174)
at
org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:143)
at
org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:922)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1910)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2899)
, while invoking $Proxy18.submitRequest over
nodeId=om122,nodeAddress=vc0120.halxg.cloudera.com:9862 after 4 failover
attempts. Trying to failover after sleeping for 6000ms. {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]