> ... The corresponding release() is in DataStreamManagement.read(..); see [2].
Oops, sorry that [2] is for releasing the buf in cases when there is an exception. (I somehow saw the catch-block as a finally-block. My mistake!) For the normal case, the release is in [6]. > 5. Once readImpl(..) has returned, buf is released in [2]. This Step #5 described previously is also incorrect. Even readImpl(..) has returned, buf may not be released since the local and remote writes in Step #4 are async. Therefore, it has to wait for all the local and remote futures to complete and then release buf in [6]. Note that [6] is inside whenComplete(..) in the code which is for all the local and remote futures. Hope that I did not confuse you too much. Tsz-Wo [6] https://github.com/apache/ratis/blob/75af736d0d20c44b5d47a458b0c7b33387a75a6d/ratis-netty/src/main/java/org/apache/ratis/netty/server/DataStreamManagement.java#L444 On Sun, Jun 26, 2022 at 10:52 AM Tsz Wo Sze <[email protected]> wrote: > Hi William, > > Thanks a lot for checking the code. > > > retain() is called here, but seems no corresponding release(). > > It is necessary to call retain() since buf will be used after decode(..). > Otherwise, buf will automatically be released as mentioned in [1]. The > corresponding release() is in DataStreamManagement.read(..); see [2]. > > The steps are > 1. Netty is reading a byte stream > 2. It calls ByteToMessageDecoder.decode(.., List<Object> out) [3] to > decode the byte stream and then adds the decoded objects to the out list. > 3. In our case, the decoded objects are DataStreamRequestByteBuf and the > ByteBuf is retained as DataStreamRequestByteBuf.buf. > 4. Later on, the same DataStreamRequestByteBuf.buf is used for writing > locally and writing to the remote peers in readImpl(..) [4]. > 5. Once readImpl(..) has returned, buf is released in [2]. > > If there is a suspicion of memory leak, could you try > -Dio.netty.leakDetection.level=ADVANCED [5] ? > > Regards, > Tsz-Wo > > > [1] > https://stackoverflow.com/questions/52883044/netty-4-0-bytebuf-retain-use > [2] > https://github.com/apache/ratis/blob/75af736d0d20c44b5d47a458b0c7b33387a75a6d/ratis-netty/src/main/java/org/apache/ratis/netty/server/DataStreamManagement.java#L385 > [3] > https://github.com/apache/ratis/blob/75af736d0d20c44b5d47a458b0c7b33387a75a6d/ratis-netty/src/main/java/org/apache/ratis/netty/server/NettyServerStreamRpc.java#L243 > [4] > https://github.com/apache/ratis/blob/75af736d0d20c44b5d47a458b0c7b33387a75a6d/ratis-netty/src/main/java/org/apache/ratis/netty/server/DataStreamManagement.java#L390 > [5] > https://stackoverflow.com/questions/43676085/how-to-test-netty-memory-leak > > On Sun, Jun 26, 2022 at 5:00 AM William Song <[email protected]> wrote: > >> Hi, >> >> We discovered Direct Memory OOM after running program with Ratis for a >> long time. Through heap dump, we find DirectByteBuffers holded by >> ratis.thirdparty.netty are increasing. >> >> I discovered that in NettyServerStreamRPC.java, newChannelInitializer, >> newDecoder, decodeDataStreamRequestByteBuf >> static DataStreamRequestByteBuf decodeDataStreamRequestByteBuf(ByteBuf >> buf) { >> return Optional.ofNullable(decodeDataStreamRequestHeader(buf)) >> .map(header -> checkHeader(header, buf)) >> .map(header -> new DataStreamRequestByteBuf(header, decodeData(buf, >> header, ByteBuf::retain))) >> .orElse(null); >> } >> >> retain() is called here, but seems no corresponding release(). Since >> retain() will add buf's (which is DirectByteBuffer) referenced count, so >> Cleaner cannot GC the buf and recycle its corresponding resources. I assume >> that this might be the cause of direct memory OOM error, but I’m new to >> Netty and not certain about it, so could anyone please take a look at it? >> Thanks in advance. >> >> Best Wishes, >> William Song >> >> >> >>
