Shawn created HDDS-6345:
---------------------------

             Summary: OM always runs OOM in Kubernetes 
                 Key: HDDS-6345
                 URL: https://issues.apache.org/jira/browse/HDDS-6345
             Project: Apache Ozone
          Issue Type: Bug
            Reporter: Shawn


I deployed ozone 1.21 to kubernetes  with security enabled and with OM HA and 
SCM HA. However, one of the OM always gets restarted by Kubernetes because of 
OOM. Even I assigned 300GB memory, the OM still keeps restarting for OOM.

 

After analysis, we found the OOM was because of rocksDB. When OM gets 
restarted, it first tries to open rocksDB. And during this time, rocksDB tries 
to do compaction, which eventually got OOM. So there are three question:

 

1. Why the OM got into this status?
2. Why rocksDB needs so much memory to do the compaction?
3. How to resolve this?

Some info maybe useful for you. We directly deploy OM HA, not migrate from one 
OM to HA OM. The OM that has issues is a follower, not a leader. The underlying 
PVC we are using is SSD. Our traffic is mostly large objects, with size of 
hundreds GBs.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to