sri created HDDS-10402:
--------------------------
Summary: OM unstable with Leader not ready yet errors
Key: HDDS-10402
URL: https://issues.apache.org/jira/browse/HDDS-10402
Project: Apache Ozone
Issue Type: Bug
Components: 1.3, Ozone Manager
Affects Versions: 1.3.0
Environment: *Any*
Reporter: sri
Fix For: 1.3.0
When we restart Ozone Manager (OM), we noticed considerable degradation of
ozone performance. Specifically the Read/Write semantics are slower than
normal. Also we see following repeated errors in OM logs.
+*>> Error Log: (OM)*+
2024-02-15 11:36:05,949 INFO
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine: Received
Configuration change notification from Ratis. New Peer list:
[id: "om1"
address: "xyz:9872"
startupRole: LEADER ----> A (New) --> Token auth failing om user
, id: "om3"
address: "B:9872"
startupRole: FOLLOWER
, id: "om2"
address: "C:9872"
startupRole: FOLLOWER ----> C (New) --> Token auth failing om user
]
2024-02-15 14:02:20,852 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth
failed for abc:45174:null (DIGEST-MD5: IO error acquiring password) with true
cause: (om1 is Leader but not ready to process request yet.)
2024-02-15 14:02:20,852 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth
failed for xyz:42414:null (DIGEST-MD5: IO error acquiring password) with true
cause: (om1 is Leader but not ready to process request yet.)
+*>> Long & persistent long jvm pause cycles during Leader election process:*+
2024-02-15 11:36:05,892 INFO org.apache.ratis.server.impl.RoleInfo: om1:
*shutdown om1@group-3B1F193E2D90-LeaderStateImpl*
2024-02-15 11:36:05,893 WARN org.apache.ratis.util.JvmPauseMonitor:
JvmPauseMonitor-om1: *Detected pause in JVM or host machine (eg GC): pause of
approximately 19274374277ns.*
{*}+>> Recon Log:+{*}{*}{*}
2024-02-15 23:21:09,029 ERROR
org.apache.hadoop.ozone.recon.tasks.OMDBUpdatesHandler: Exception when reading
key :
java.io.IOException: Rocks Database is closed
at
org.apache.hadoop.hdds.utils.db.RocksDatabase.assertClose(RocksDatabase.java:407)
at
org.apache.hadoop.hdds.utils.db.RocksDatabase.get(RocksDatabase.java:641)
at org.apache.hadoop.hdds.utils.db.RDBTable.get(RDBTable.java:110)
at org.apache.hadoop.hdds.utils.db.RDBTable.get(RDBTable.java:40)
at
org.apache.hadoop.hdds.utils.db.TypedTable.getFromTable(TypedTable.java:255)
at
org.apache.hadoop.hdds.utils.db.TypedTable.getSkipCache(TypedTable.java:195)
at
org.apache.hadoop.ozone.recon.tasks.OMDBUpdatesHandler.processEvent(OMDBUpdatesHandler.java:128)
at
org.apache.hadoop.ozone.recon.tasks.OMDBUpdatesHandler.put(OMDBUpdatesHandler.java:67)
at org.rocksdb.WriteBatch.iterate(Native Method)
at org.rocksdb.WriteBatch.iterate(WriteBatch.java:63)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]