> ...  The corresponding release() is in DataStreamManagement.read(..); see
[2].

Oops, sorry that [2] is for releasing the buf in cases when there is an
exception.  (I somehow saw the catch-block as a finally-block.  My
mistake!)  For the normal case, the release is in [6].

> 5. Once readImpl(..) has returned, buf is released in [2].

This Step #5 described previously is also incorrect.  Even readImpl(..) has
returned, buf may not be released since the local and remote writes in Step
#4 are async.  Therefore, it has to wait for all the local and remote
futures to complete and then release buf in [6].  Note that [6] is inside
whenComplete(..) in the code which is for all the local and remote futures.

Hope that I did not confuse you too much.
Tsz-Wo

[6]
https://github.com/apache/ratis/blob/75af736d0d20c44b5d47a458b0c7b33387a75a6d/ratis-netty/src/main/java/org/apache/ratis/netty/server/DataStreamManagement.java#L444

On Sun, Jun 26, 2022 at 10:52 AM Tsz Wo Sze <[email protected]> wrote:

> Hi William,
>
> Thanks a lot for checking the code.
>
> > retain() is called here, but seems no corresponding release().
>
> It is necessary to call retain() since buf will be used after decode(..).
> Otherwise, buf will automatically be released as mentioned in [1].  The
> corresponding release() is in DataStreamManagement.read(..); see [2].
>
> The steps are
> 1. Netty is reading a byte stream
> 2. It calls ByteToMessageDecoder.decode(.., List<Object> out) [3]  to
> decode the byte stream and then adds the decoded objects to the out list.
> 3. In our case, the decoded objects are DataStreamRequestByteBuf and the
> ByteBuf is retained as DataStreamRequestByteBuf.buf.
> 4. Later on, the same DataStreamRequestByteBuf.buf is used for writing
> locally and writing to the remote peers in readImpl(..) [4].
> 5. Once readImpl(..) has returned, buf is released in [2].
>
> If there is a suspicion of memory leak, could you try
> -Dio.netty.leakDetection.level=ADVANCED [5] ?
>
> Regards,
> Tsz-Wo
>
>
> [1]
> https://stackoverflow.com/questions/52883044/netty-4-0-bytebuf-retain-use
> [2]
> https://github.com/apache/ratis/blob/75af736d0d20c44b5d47a458b0c7b33387a75a6d/ratis-netty/src/main/java/org/apache/ratis/netty/server/DataStreamManagement.java#L385
> [3]
> https://github.com/apache/ratis/blob/75af736d0d20c44b5d47a458b0c7b33387a75a6d/ratis-netty/src/main/java/org/apache/ratis/netty/server/NettyServerStreamRpc.java#L243
> [4]
> https://github.com/apache/ratis/blob/75af736d0d20c44b5d47a458b0c7b33387a75a6d/ratis-netty/src/main/java/org/apache/ratis/netty/server/DataStreamManagement.java#L390
> [5]
> https://stackoverflow.com/questions/43676085/how-to-test-netty-memory-leak
>
> On Sun, Jun 26, 2022 at 5:00 AM William Song <[email protected]> wrote:
>
>> Hi,
>>
>> We discovered Direct Memory OOM after running program with Ratis for a
>> long time. Through heap dump, we find DirectByteBuffers holded by
>> ratis.thirdparty.netty are increasing.
>>
>>  I discovered that in NettyServerStreamRPC.java, newChannelInitializer,
>> newDecoder, decodeDataStreamRequestByteBuf
>> static DataStreamRequestByteBuf decodeDataStreamRequestByteBuf(ByteBuf
>> buf) {
>>   return Optional.ofNullable(decodeDataStreamRequestHeader(buf))
>>       .map(header -> checkHeader(header, buf))
>>       .map(header -> new DataStreamRequestByteBuf(header, decodeData(buf,
>> header, ByteBuf::retain)))
>>       .orElse(null);
>> }
>>
>> retain() is called here, but seems no corresponding release(). Since
>> retain() will add buf's (which is DirectByteBuffer) referenced count, so
>> Cleaner cannot GC the buf and recycle its corresponding resources. I assume
>> that this might be the cause of direct memory OOM error, but I’m new to
>> Netty and not certain about it, so could anyone please take a look at it?
>> Thanks in advance.
>>
>> Best Wishes,
>> William Song
>>
>>
>>
>>

Reply via email to