[
https://issues.apache.org/jira/browse/RATIS-726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shashikant Banerjee updated RATIS-726:
--------------------------------------
Description:
While running freon with 1 Node ratis, it was observed that the
TimeoutScheduler holds on to the raftClientObject atleast for 3s(default for
requestTimeoutDuration) even though the request is processed successfully and
acknowledged back. This ends up creating a memory pressure causing ozone client
to go OOM .
Heapdump analysis of HDDS-2331 , it seems the timeout schduler holding onto
total of 176 requests, (88 of writeChunk containing actual data and 88 putBlock
requests) although data write is happening sequentially key by key in ozone.
Thanks [~adoroszlai] for helping out discovering this.
cc ~ [~ljain] [~msingh] [~szetszwo] [~jnpandey]
Similar fix may be required in GrpCLogAppender as well it uses the same
TimeoutScheduler.
was:
While running freon with 1 Node ratis, it was observed that the
TimeoutScheduler holds on to the raftClientObject atleast for 3s(default for
requestTimeoutDuration) even though the request is processed successfully and
acknowledged back. This ends up creating a memory pressure causing ozone client
to go OOM .
Heapdump analysis of HDDS-2331 , it seems the timeout schduler holding onto
total of 176 requests, (88 of writeChunk containing actual data) and 88
putBlock requests although data write is happening sequentially key by key in
ozone.
Thanks [~adoroszlai] for helping out discovering this.
cc ~ [~ljain] [~msingh] [~szetszwo] [~jnpandey]
Similar fix may be required in GrpCLogAppender as well it uses the same
TimeoutScheduler.
> TimeoutScheduler holds on to the raftClientRequest till it times out even
> though request succeeds
> -------------------------------------------------------------------------------------------------
>
> Key: RATIS-726
> URL: https://issues.apache.org/jira/browse/RATIS-726
> Project: Ratis
> Issue Type: Bug
> Components: client
> Reporter: Shashikant Banerjee
> Priority: Major
>
> While running freon with 1 Node ratis, it was observed that the
> TimeoutScheduler holds on to the raftClientObject atleast for 3s(default for
> requestTimeoutDuration) even though the request is processed successfully and
> acknowledged back. This ends up creating a memory pressure causing ozone
> client to go OOM .
> Heapdump analysis of HDDS-2331 , it seems the timeout schduler holding onto
> total of 176 requests, (88 of writeChunk containing actual data and 88
> putBlock requests) although data write is happening sequentially key by key
> in ozone.
> Thanks [~adoroszlai] for helping out discovering this.
> cc ~ [~ljain] [~msingh] [~szetszwo] [~jnpandey]
> Similar fix may be required in GrpCLogAppender as well it uses the same
> TimeoutScheduler.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)