[ 
https://issues.apache.org/jira/browse/RATIS-726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-726:
--------------------------------------
    Description: 
While running freon with 1 Node ratis, it was observed that the 
TimeoutScheduler holds on to the raftClientObject atleast for 3s(default for 
requestTimeoutDuration) even though the request is processed successfully and 
acknowledged back. This ends up creating a memory pressure causing ozone client 
to go OOM .

Attached is the heapdump on analysis of which, it seems the timeout schduler 
holding onto total of 176 requests, (88 of writeChunk containing actual data) 
and 88 putBlock requests although data write is happening sequentially key by 
key in ozone.

Thanks [~adoroszlai] for helping out discovering this.

cc ~ [~ljain] [~msingh] [~szetszwo] [~jnpandey]

Similar fix may be required in GrpCLogAppender as well it uses the same 
TimeoutScheduler.

  was:
While running freon with 1 Node ratis, it was observed that the 
TimeoutScheduler holds on to the raftClientObject atleast for 3s(default for 
requestTimeoutDuration) even though the request is processed successfully and 
acknowledged back. This ends up creating a memory pressure causing ozone client 
to go OOM .

Attached is the heapdump on analysis of which, it seems the timeout schduler 
holding onto total of 176 requests, (88 of writeChunk containing actual data) 
and 88 putBlock requests although data write is happening sequentially key by 
key in ozone.

Thanks [~adoroszlai] for helping out discovering this.

cc ~ [~ljain] [~msingh] [~szetszwo] [~jnpandey]


> TimeoutScheduler holds on to the raftClientRequest till it times out even 
> though request succeeds
> -------------------------------------------------------------------------------------------------
>
>                 Key: RATIS-726
>                 URL: https://issues.apache.org/jira/browse/RATIS-726
>             Project: Ratis
>          Issue Type: Bug
>          Components: client
>            Reporter: Shashikant Banerjee
>            Priority: Major
>
> While running freon with 1 Node ratis, it was observed that the 
> TimeoutScheduler holds on to the raftClientObject atleast for 3s(default for 
> requestTimeoutDuration) even though the request is processed successfully and 
> acknowledged back. This ends up creating a memory pressure causing ozone 
> client to go OOM .
> Attached is the heapdump on analysis of which, it seems the timeout schduler 
> holding onto total of 176 requests, (88 of writeChunk containing actual data) 
> and 88 putBlock requests although data write is happening sequentially key by 
> key in ozone.
> Thanks [~adoroszlai] for helping out discovering this.
> cc ~ [~ljain] [~msingh] [~szetszwo] [~jnpandey]
> Similar fix may be required in GrpCLogAppender as well it uses the same 
> TimeoutScheduler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to