[ 
https://issues.apache.org/jira/browse/HDDS-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499114#comment-17499114
 ] 

Ritesh H Shukla commented on HDDS-6345:
---------------------------------------

More slack discussio: 
https://the-asf.slack.com/archives/C5RK7PWA1/p1645033763755759

> OM always runs OOM in Kubernetes 
> ---------------------------------
>
>                 Key: HDDS-6345
>                 URL: https://issues.apache.org/jira/browse/HDDS-6345
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Shawn
>            Priority: Major
>
> I deployed ozone 1.21 to kubernetes  with security enabled and with OM HA and 
> SCM HA. However, one of the OM always gets restarted by Kubernetes because of 
> OOM. Even I assigned 300GB memory, the OM still keeps restarting for OOM.
>  
> After analysis, we found the OOM was because of rocksDB. When OM gets 
> restarted, it first tries to open rocksDB. And during this time, rocksDB 
> tries to do compaction, which eventually got OOM. So there are three question:
>  
> 1. Why the OM got into this status?
> 2. Why rocksDB needs so much memory to do the compaction?
> 3. How to resolve this?
> Some info maybe useful for you. We directly deploy OM HA, not migrate from 
> one OM to HA OM. The OM that has issues is a follower, not a leader. The 
> underlying PVC we are using is SSD. Our traffic is mostly large objects, with 
> size of hundreds GBs.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to