[
https://issues.apache.org/jira/browse/HDDS-10402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17821057#comment-17821057
]
sri commented on HDDS-10402:
----------------------------
Hi [~erose] , [~szetszwo] , thanks very much for your inputs.
We have not yet migrated to Ozone 1.4.0.
But in the time, could you please point us to any Bug in current version 1.3x.
Thanks again.
> OM unstable with long jvm pauses
> --------------------------------
>
> Key: HDDS-10402
> URL: https://issues.apache.org/jira/browse/HDDS-10402
> Project: Apache Ozone
> Issue Type: Bug
> Components: 1.3, Ozone Manager
> Affects Versions: 1.3.0
> Environment: *Any*
> Reporter: sri
> Assignee: Sadanand Shenoy
> Priority: Major
> Fix For: 1.3.0
>
>
> When we restart Ozone Manager (OM), we noticed considerable degradation of
> ozone performance. Specifically the Read/Write semantics are slower than
> normal. Also we see following repeated errors in OM logs.
>
> +*>> Error Log: (OM)*+
> 2024-02-15 11:36:05,949 INFO
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine: Received
> Configuration change notification from Ratis. New Peer list:
> [id: "om1"
> address: "xyz:9872"
> startupRole: LEADER ----> A (New) --> Token auth failing om user
> , id: "om3"
> address: "B:9872"
> startupRole: FOLLOWER
> , id: "om2"
> address: "C:9872"
> startupRole: FOLLOWER ----> C (New) --> Token auth failing om user
> ]
> 2024-02-15 14:02:20,852 WARN SecurityLogger.org.apache.hadoop.ipc.Server:
> Auth failed for abc:45174:null (DIGEST-MD5: IO error acquiring password) with
> true cause: (om1 is Leader but not ready to process request yet.)
> 2024-02-15 14:02:20,852 WARN SecurityLogger.org.apache.hadoop.ipc.Server:
> Auth failed for xyz:42414:null (DIGEST-MD5: IO error acquiring password) with
> true cause: (om1 is Leader but not ready to process request yet.)
>
> +*>> Long & persistent long jvm pause cycles during Leader election process:*+
> 2024-02-15 11:36:05,892 INFO org.apache.ratis.server.impl.RoleInfo: om1:
> *shutdown om1@group-3B1F193E2D90-LeaderStateImpl*
> 2024-02-15 11:36:05,893 WARN org.apache.ratis.util.JvmPauseMonitor:
> JvmPauseMonitor-om1: *Detected pause in JVM or host machine (eg GC): pause of
> approximately 19274374277ns.*
>
> {*}+>> Recon Log:+{*}{*}{*}
> 2024-02-15 23:21:09,029 ERROR
> org.apache.hadoop.ozone.recon.tasks.OMDBUpdatesHandler: Exception when
> reading key :
> java.io.IOException: Rocks Database is closed
> at
> org.apache.hadoop.hdds.utils.db.RocksDatabase.assertClose(RocksDatabase.java:407)
> at
> org.apache.hadoop.hdds.utils.db.RocksDatabase.get(RocksDatabase.java:641)
> at org.apache.hadoop.hdds.utils.db.RDBTable.get(RDBTable.java:110)
> at org.apache.hadoop.hdds.utils.db.RDBTable.get(RDBTable.java:40)
> at
> org.apache.hadoop.hdds.utils.db.TypedTable.getFromTable(TypedTable.java:255)
> at
> org.apache.hadoop.hdds.utils.db.TypedTable.getSkipCache(TypedTable.java:195)
> at
> org.apache.hadoop.ozone.recon.tasks.OMDBUpdatesHandler.processEvent(OMDBUpdatesHandler.java:128)
> at
> org.apache.hadoop.ozone.recon.tasks.OMDBUpdatesHandler.put(OMDBUpdatesHandler.java:67)
> at org.rocksdb.WriteBatch.iterate(Native Method)
> at org.rocksdb.WriteBatch.iterate(WriteBatch.java:63)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]