[ 
https://issues.apache.org/jira/browse/HDDS-5472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385480#comment-17385480
 ] 

Janus Chow commented on HDDS-5472:
----------------------------------

I think it has the same source, but the outcome and solution proposed are 
different.

In our case, we have some WARN logs, but the size is mostly around 2MB. So the 
warn log isn't what we try to solve here. And the outcome is OOM of OM, not 
available for the cluster. 

In HDDS-5243, we talked about the root cause, which is the design in 
OmKeyLocationInfoGroup, and this design has been hurting not only the IPC 
response size, but also the RockDB. If HDDS-5243 and HDDS-5393 are trying to 
fix this issue on the surface, this ticket I think is trying to fix it from the 
very source. 

As I computed in this comment 
[https://github.com/apache/ozone/pull/2261#issuecomment-843308499,] we used to 
want to avoid this redundancy in IPC response, but I think it should be avoided 
from deeper, like rocksDB side, and the 
[prepareKeyInfo|https://github.com/apache/ozone/blob/23352f818e74b1e59b48ec63adce28cb87c1603c/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/key/OMKeyRequest.java#L611],
 I think it would be good to recheck the design of OmKeyLocationInfoGroup here.

> Old versions of location in OmKeyLocationInfoGroup causes OOM of OM
> -------------------------------------------------------------------
>
>                 Key: HDDS-5472
>                 URL: https://issues.apache.org/jira/browse/HDDS-5472
>             Project: Apache Ozone
>          Issue Type: Improvement
>            Reporter: Janus Chow
>            Assignee: Janus Chow
>            Priority: Major
>              Labels: pull-request-available
>
> Currently the keyLocation of different versions is stored redundantly in 
> OmKeyInfo. 
>  * OmKeyInfo:
>  ** keyLocationVersions: List<OmKeyLocationInfoGroup>  stores different 
> versions of information in a list.
>  * OmKeyLocationInfoGroup:
>  **  locationVersionMap: Map<Long, List<OmKeyLocationInfo>> stores different 
> versions of location in a map.
> If the versions are large, the redundent location is causing the large GC 
> overhead for OM, in our cluster, the OM even crashes because of OOM.
> This ticket is to remove the redundant location information to keep OM 
> healthy.
> Attached the OmKeyInfo of a key with 549 versions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to