[
https://issues.apache.org/jira/browse/HDDS-10402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17820836#comment-17820836
]
Ethan Rose commented on HDDS-10402:
-----------------------------------
[~sri9] does the issue still occur with Ozone 1.4.0? I blieve there some memory
fixes in the OM in that release that 1.3.0 does not have.
> OM unstable with long jvm pauses
> --------------------------------
>
> Key: HDDS-10402
> URL: https://issues.apache.org/jira/browse/HDDS-10402
> Project: Apache Ozone
> Issue Type: Bug
> Components: 1.3, Ozone Manager
> Affects Versions: 1.3.0
> Environment: *Any*
> Reporter: sri
> Assignee: Sadanand Shenoy
> Priority: Major
> Fix For: 1.3.0
>
>
> When we restart Ozone Manager (OM), we noticed considerable degradation of
> ozone performance. Specifically the Read/Write semantics are slower than
> normal. Also we see following repeated errors in OM logs.
>
> +*>> Error Log: (OM)*+
> 2024-02-15 11:36:05,949 INFO
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine: Received
> Configuration change notification from Ratis. New Peer list:
> [id: "om1"
> address: "xyz:9872"
> startupRole: LEADER ----> A (New) --> Token auth failing om user
> , id: "om3"
> address: "B:9872"
> startupRole: FOLLOWER
> , id: "om2"
> address: "C:9872"
> startupRole: FOLLOWER ----> C (New) --> Token auth failing om user
> ]
> 2024-02-15 14:02:20,852 WARN SecurityLogger.org.apache.hadoop.ipc.Server:
> Auth failed for abc:45174:null (DIGEST-MD5: IO error acquiring password) with
> true cause: (om1 is Leader but not ready to process request yet.)
> 2024-02-15 14:02:20,852 WARN SecurityLogger.org.apache.hadoop.ipc.Server:
> Auth failed for xyz:42414:null (DIGEST-MD5: IO error acquiring password) with
> true cause: (om1 is Leader but not ready to process request yet.)
>
> +*>> Long & persistent long jvm pause cycles during Leader election process:*+
> 2024-02-15 11:36:05,892 INFO org.apache.ratis.server.impl.RoleInfo: om1:
> *shutdown om1@group-3B1F193E2D90-LeaderStateImpl*
> 2024-02-15 11:36:05,893 WARN org.apache.ratis.util.JvmPauseMonitor:
> JvmPauseMonitor-om1: *Detected pause in JVM or host machine (eg GC): pause of
> approximately 19274374277ns.*
>
> {*}+>> Recon Log:+{*}{*}{*}
> 2024-02-15 23:21:09,029 ERROR
> org.apache.hadoop.ozone.recon.tasks.OMDBUpdatesHandler: Exception when
> reading key :
> java.io.IOException: Rocks Database is closed
> at
> org.apache.hadoop.hdds.utils.db.RocksDatabase.assertClose(RocksDatabase.java:407)
> at
> org.apache.hadoop.hdds.utils.db.RocksDatabase.get(RocksDatabase.java:641)
> at org.apache.hadoop.hdds.utils.db.RDBTable.get(RDBTable.java:110)
> at org.apache.hadoop.hdds.utils.db.RDBTable.get(RDBTable.java:40)
> at
> org.apache.hadoop.hdds.utils.db.TypedTable.getFromTable(TypedTable.java:255)
> at
> org.apache.hadoop.hdds.utils.db.TypedTable.getSkipCache(TypedTable.java:195)
> at
> org.apache.hadoop.ozone.recon.tasks.OMDBUpdatesHandler.processEvent(OMDBUpdatesHandler.java:128)
> at
> org.apache.hadoop.ozone.recon.tasks.OMDBUpdatesHandler.put(OMDBUpdatesHandler.java:67)
> at org.rocksdb.WriteBatch.iterate(Native Method)
> at org.rocksdb.WriteBatch.iterate(WriteBatch.java:63)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]