Shawn created HDDS-6345:
---------------------------
Summary: OM always runs OOM in Kubernetes
Key: HDDS-6345
URL: https://issues.apache.org/jira/browse/HDDS-6345
Project: Apache Ozone
Issue Type: Bug
Reporter: Shawn
I deployed ozone 1.21 to kubernetes with security enabled and with OM HA and
SCM HA. However, one of the OM always gets restarted by Kubernetes because of
OOM. Even I assigned 300GB memory, the OM still keeps restarting for OOM.
After analysis, we found the OOM was because of rocksDB. When OM gets
restarted, it first tries to open rocksDB. And during this time, rocksDB tries
to do compaction, which eventually got OOM. So there are three question:
1. Why the OM got into this status?
2. Why rocksDB needs so much memory to do the compaction?
3. How to resolve this?
Some info maybe useful for you. We directly deploy OM HA, not migrate from one
OM to HA OM. The OM that has issues is a follower, not a leader. The underlying
PVC we are using is SSD. Our traffic is mostly large objects, with size of
hundreds GBs.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]