[ 
https://issues.apache.org/jira/browse/RATIS-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875576#comment-17875576
 ] 

Duong commented on RATIS-2141:
------------------------------

[~szetszwo] In the case of Ozone Datanode, the RaftLogCache keeps 4MB entries 
and only counts ~100 bytes for each retention policy. That falls into the 
"leak" category. But I'm ok with OOM too. 

> OOM for stateMachineCache use cases
> -----------------------------------
>
>                 Key: RATIS-2141
>                 URL: https://issues.apache.org/jira/browse/RATIS-2141
>             Project: Ratis
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.1.0
>            Reporter: Duong
>            Priority: Major
>         Attachments: RaftLogCache_entry.png, heap-dump.png
>
>
> In 3.1.0, with stateMachineCache enabled, the RaftLogCache entries contain a 
> reference to the original RaftClientRequest. This is not supposed to happen 
> as RaftLogCache entries should only refer to the LogEntries with data 
> truncated, and RaftLogCache retention policy only counts the size of the 
> entries without data.
> This problem impacts Apache Ozone. The reference form RaftLogCache entries 
> prevent the original RaftClientRequest (which contains a large data chunk) to 
> be GCed. The result is Ozone datanodes quickly run out of heap memory.
> !heap-dump.png|width=1286,height=141!
> !RaftLogCache_entry.png|width=730,height=272!
> This is not the case with the latest master branch, only with the 3.1.0 
> release.
> The fix for this issue in 3.1.0 is as simple as 
> [6a141544c567a6325b05e2972cd426cdc14060cb|https://github.com/duongkame/ratis/commit/bcff74af0a5fa4b68af2267ce8dfa01f65a5445c].
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to