[
https://issues.apache.org/jira/browse/HDDS-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499114#comment-17499114
]
Ritesh H Shukla commented on HDDS-6345:
---------------------------------------
More slack discussio:
https://the-asf.slack.com/archives/C5RK7PWA1/p1645033763755759
> OM always runs OOM in Kubernetes
> ---------------------------------
>
> Key: HDDS-6345
> URL: https://issues.apache.org/jira/browse/HDDS-6345
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Shawn
> Priority: Major
>
> I deployed ozone 1.21 to kubernetes with security enabled and with OM HA and
> SCM HA. However, one of the OM always gets restarted by Kubernetes because of
> OOM. Even I assigned 300GB memory, the OM still keeps restarting for OOM.
>
> After analysis, we found the OOM was because of rocksDB. When OM gets
> restarted, it first tries to open rocksDB. And during this time, rocksDB
> tries to do compaction, which eventually got OOM. So there are three question:
>
> 1. Why the OM got into this status?
> 2. Why rocksDB needs so much memory to do the compaction?
> 3. How to resolve this?
> Some info maybe useful for you. We directly deploy OM HA, not migrate from
> one OM to HA OM. The OM that has issues is a follower, not a leader. The
> underlying PVC we are using is SSD. Our traffic is mostly large objects, with
> size of hundreds GBs.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]