[ 
https://issues.apache.org/jira/browse/RATIS-726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-726:
--------------------------------------
    Description: 
While running freon with 1 Node ratis, it was observed that the 
TimeoutScheduler holds on to the raftClientObject atleast for 3s(default for 
requestTimeoutDuration) even though the request is processed successfully and 
acknowledged back. This ends up creating a memory pressure causing ozone client 
to go OOM .

 Heapdump analysis of HDDS-2331 , it seems the timeout schduler holding onto 
total of 176 requests, (88 of writeChunk containing actual data and 88 putBlock 
requests) although data write is happening sequentially key by key in ozone.

Thanks [~adoroszlai] for helping out discovering this.

cc ~ [~ljain] [~msingh] [~szetszwo] [~jnpandey]

Similar fix may be required in GrpCLogAppender as well it uses the same 
TimeoutScheduler.

  was:
While running freon with 1 Node ratis, it was observed that the 
TimeoutScheduler holds on to the raftClientObject atleast for 3s(default for 
requestTimeoutDuration) even though the request is processed successfully and 
acknowledged back. This ends up creating a memory pressure causing ozone client 
to go OOM .

 Heapdump analysis of HDDS-2331 , it seems the timeout schduler holding onto 
total of 176 requests, (88 of writeChunk containing actual data) and 88 
putBlock requests although data write is happening sequentially key by key in 
ozone.

Thanks [~adoroszlai] for helping out discovering this.

cc ~ [~ljain] [~msingh] [~szetszwo] [~jnpandey]

Similar fix may be required in GrpCLogAppender as well it uses the same 
TimeoutScheduler.


> TimeoutScheduler holds on to the raftClientRequest till it times out even 
> though request succeeds
> -------------------------------------------------------------------------------------------------
>
>                 Key: RATIS-726
>                 URL: https://issues.apache.org/jira/browse/RATIS-726
>             Project: Ratis
>          Issue Type: Bug
>          Components: client
>            Reporter: Shashikant Banerjee
>            Priority: Major
>
> While running freon with 1 Node ratis, it was observed that the 
> TimeoutScheduler holds on to the raftClientObject atleast for 3s(default for 
> requestTimeoutDuration) even though the request is processed successfully and 
> acknowledged back. This ends up creating a memory pressure causing ozone 
> client to go OOM .
>  Heapdump analysis of HDDS-2331 , it seems the timeout schduler holding onto 
> total of 176 requests, (88 of writeChunk containing actual data and 88 
> putBlock requests) although data write is happening sequentially key by key 
> in ozone.
> Thanks [~adoroszlai] for helping out discovering this.
> cc ~ [~ljain] [~msingh] [~szetszwo] [~jnpandey]
> Similar fix may be required in GrpCLogAppender as well it uses the same 
> TimeoutScheduler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to