sri created HDDS-10402:
--------------------------

             Summary: OM unstable with Leader not ready yet errors
                 Key: HDDS-10402
                 URL: https://issues.apache.org/jira/browse/HDDS-10402
             Project: Apache Ozone
          Issue Type: Bug
          Components: 1.3, Ozone Manager
    Affects Versions: 1.3.0
         Environment: *Any*
            Reporter: sri
             Fix For: 1.3.0


When we restart Ozone Manager (OM), we noticed considerable degradation of 
ozone performance. Specifically the Read/Write semantics are slower than 
normal. Also we see following repeated errors in OM logs.

 

+*>> Error Log: (OM)*+

2024-02-15 11:36:05,949 INFO 
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine: Received 
Configuration change notification from Ratis. New Peer list:
[id: "om1"
address: "xyz:9872"
startupRole: LEADER  ----> A (New) --> Token auth failing om user
, id: "om3"
address: "B:9872"
startupRole: FOLLOWER
, id: "om2"
address: "C:9872"
startupRole: FOLLOWER  ----> C (New) --> Token auth failing om user
]

2024-02-15 14:02:20,852 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth 
failed for abc:45174:null (DIGEST-MD5: IO error acquiring password) with true 
cause: (om1 is Leader but not ready to process request yet.)
2024-02-15 14:02:20,852 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth 
failed for xyz:42414:null (DIGEST-MD5: IO error acquiring password) with true 
cause: (om1 is Leader but not ready to process request yet.)

 

+*>> Long & persistent long jvm pause cycles during Leader election process:*+

2024-02-15 11:36:05,892 INFO org.apache.ratis.server.impl.RoleInfo: om1: 
*shutdown om1@group-3B1F193E2D90-LeaderStateImpl*
2024-02-15 11:36:05,893 WARN org.apache.ratis.util.JvmPauseMonitor: 
JvmPauseMonitor-om1: *Detected pause in JVM or host machine (eg GC): pause of 
approximately 19274374277ns.*

 

{*}+>> Recon Log:+{*}{*}{*}

2024-02-15 23:21:09,029 ERROR 
org.apache.hadoop.ozone.recon.tasks.OMDBUpdatesHandler: Exception when reading 
key :
java.io.IOException: Rocks Database is closed
        at 
org.apache.hadoop.hdds.utils.db.RocksDatabase.assertClose(RocksDatabase.java:407)
        at 
org.apache.hadoop.hdds.utils.db.RocksDatabase.get(RocksDatabase.java:641)
        at org.apache.hadoop.hdds.utils.db.RDBTable.get(RDBTable.java:110)
        at org.apache.hadoop.hdds.utils.db.RDBTable.get(RDBTable.java:40)
        at 
org.apache.hadoop.hdds.utils.db.TypedTable.getFromTable(TypedTable.java:255)
        at 
org.apache.hadoop.hdds.utils.db.TypedTable.getSkipCache(TypedTable.java:195)
        at 
org.apache.hadoop.ozone.recon.tasks.OMDBUpdatesHandler.processEvent(OMDBUpdatesHandler.java:128)
        at 
org.apache.hadoop.ozone.recon.tasks.OMDBUpdatesHandler.put(OMDBUpdatesHandler.java:67)
        at org.rocksdb.WriteBatch.iterate(Native Method)
        at org.rocksdb.WriteBatch.iterate(WriteBatch.java:63)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to