[
https://issues.apache.org/jira/browse/HDDS-5472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385480#comment-17385480
]
Janus Chow commented on HDDS-5472:
----------------------------------
I think it has the same source, but the outcome and solution proposed are
different.
In our case, we have some WARN logs, but the size is mostly around 2MB. So the
warn log isn't what we try to solve here. And the outcome is OOM of OM, not
available for the cluster.
In HDDS-5243, we talked about the root cause, which is the design in
OmKeyLocationInfoGroup, and this design has been hurting not only the IPC
response size, but also the RockDB. If HDDS-5243 and HDDS-5393 are trying to
fix this issue on the surface, this ticket I think is trying to fix it from the
very source.
As I computed in this comment
[https://github.com/apache/ozone/pull/2261#issuecomment-843308499,] we used to
want to avoid this redundancy in IPC response, but I think it should be avoided
from deeper, like rocksDB side, and the
[prepareKeyInfo|https://github.com/apache/ozone/blob/23352f818e74b1e59b48ec63adce28cb87c1603c/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/key/OMKeyRequest.java#L611],
I think it would be good to recheck the design of OmKeyLocationInfoGroup here.
> Old versions of location in OmKeyLocationInfoGroup causes OOM of OM
> -------------------------------------------------------------------
>
> Key: HDDS-5472
> URL: https://issues.apache.org/jira/browse/HDDS-5472
> Project: Apache Ozone
> Issue Type: Improvement
> Reporter: Janus Chow
> Assignee: Janus Chow
> Priority: Major
> Labels: pull-request-available
>
> Currently the keyLocation of different versions is stored redundantly in
> OmKeyInfo.
> * OmKeyInfo:
> ** keyLocationVersions: List<OmKeyLocationInfoGroup> stores different
> versions of information in a list.
> * OmKeyLocationInfoGroup:
> ** locationVersionMap: Map<Long, List<OmKeyLocationInfo>> stores different
> versions of location in a map.
> If the versions are large, the redundent location is causing the large GC
> overhead for OM, in our cluster, the OM even crashes because of OOM.
> This ticket is to remove the redundant location information to keep OM
> healthy.
> Attached the OmKeyInfo of a key with 549 versions.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]