Hi William, When the log entry size is large, it is known that GrpcLogAppender has a high memory usage. As you described, the gRPC library allocates a buffer for each appendEntries call, including retries. It seems that we don't have any control for this part.
A long term solution is to use Streaming to implement a LogAppender; see [7]. As a workaround, you may consider using Streaming to write data in your application. Then, the log entries contain only metadata. This is what we are doing in Apache Ozone [8]. We did get a very good performance -- Streaming is around 2 times faster than Async for a single client and 3 times faster for 3 clients; see "20220702_Single Client - Benchmarks for Ozone Streaming.pdf" and "20220702_3 Clients - Benchmarks for Ozone Streaming.pdf" in [8]. Ratis Streaming also gives us other benefits such as reducing the workload of the leader. For more information, see RatisStreaming20200929.pdf in [9]. Tsz-Wo [7] https://issues.apache.org/jira/browse/RATIS-1310 [8] https://issues.apache.org/jira/browse/HDDS-4454 [9] https://issues.apache.org/jira/browse/RATIS-979 On Sat, Jul 2, 2022 at 7:21 AM William Song <[email protected]> wrote: > Thanks for the detailed explanation, now I can say I’m sure that there is > no direct memory leak in NettyStreamImpl. I suspect DirectOOM we > encountered days before was caused by real shortage of memory. > > The GrpcLogAppender uses Netty as its underlying network channel, and > Netty will allocate direct ByteBuffer for each AppendLog RPC. When log > entries each contains megabytes of data & arrives atRaft Leader in high > throughput, Netty’s PooledDirectByteBuffer’s free rate cannot catch up with > the allocation rate, which leads to the direct memory usage’s stead > increase and eventually OOM. > > To make things worse, if a group contains 2n+1 members, the leader has to > send 2n times of AE, which multiplies Direct memory usage by a factor of > 2n. There will be only 1 copy in Heap but 2n copies in Direct memory. > > What can we do to reduce the direct memory usage? It is possible that we > can let 2n AE RPCs uses the same directByteBuffer? > > PS: the call stack of AppendEntry SendRequest > > <screenshot removed> > > Best Wishes, > William Song > > > > 2022年7月1日 05:14,Tsz Wo Sze <[email protected]> 写道: > > ... The corresponding release() is in DataStreamManagement.read(..); see > > [2]. > > Oops, sorry that [2] is for releasing the buf in cases when there is an > exception. (I somehow saw the catch-block as a finally-block. My > mistake!) For the normal case, the release is in [6]. > > 5. Once readImpl(..) has returned, buf is released in [2]. > > > This Step #5 described previously is also incorrect. Even readImpl(..) has > returned, buf may not be released since the local and remote writes in Step > #4 are async. Therefore, it has to wait for all the local and remote > futures to complete and then release buf in [6]. Note that [6] is inside > whenComplete(..) in the code which is for all the local and remote futures. > > Hope that I did not confuse you too much. > Tsz-Wo > > [6] > > https://github.com/apache/ratis/blob/75af736d0d20c44b5d47a458b0c7b33387a75a6d/ratis-netty/src/main/java/org/apache/ratis/netty/server/DataStreamManagement.java#L444 > > On Sun, Jun 26, 2022 at 10:52 AM Tsz Wo Sze <[email protected]> wrote: > > Hi William, > > Thanks a lot for checking the code. > > retain() is called here, but seems no corresponding release(). > > > It is necessary to call retain() since buf will be used after decode(..). > Otherwise, buf will automatically be released as mentioned in [1]. The > corresponding release() is in DataStreamManagement.read(..); see [2]. > > The steps are > 1. Netty is reading a byte stream > 2. It calls ByteToMessageDecoder.decode(.., List<Object> out) [3] to > decode the byte stream and then adds the decoded objects to the out list. > 3. In our case, the decoded objects are DataStreamRequestByteBuf and the > ByteBuf is retained as DataStreamRequestByteBuf.buf. > 4. Later on, the same DataStreamRequestByteBuf.buf is used for writing > locally and writing to the remote peers in readImpl(..) [4]. > 5. Once readImpl(..) has returned, buf is released in [2]. > > If there is a suspicion of memory leak, could you try > -Dio.netty.leakDetection.level=ADVANCED [5] ? > > Regards, > Tsz-Wo > > > [1] > https://stackoverflow.com/questions/52883044/netty-4-0-bytebuf-retain-use > [2] > > https://github.com/apache/ratis/blob/75af736d0d20c44b5d47a458b0c7b33387a75a6d/ratis-netty/src/main/java/org/apache/ratis/netty/server/DataStreamManagement.java#L385 > [3] > > https://github.com/apache/ratis/blob/75af736d0d20c44b5d47a458b0c7b33387a75a6d/ratis-netty/src/main/java/org/apache/ratis/netty/server/NettyServerStreamRpc.java#L243 > [4] > > https://github.com/apache/ratis/blob/75af736d0d20c44b5d47a458b0c7b33387a75a6d/ratis-netty/src/main/java/org/apache/ratis/netty/server/DataStreamManagement.java#L390 > [5] > https://stackoverflow.com/questions/43676085/how-to-test-netty-memory-leak > > On Sun, Jun 26, 2022 at 5:00 AM William Song <[email protected]> wrote: > > Hi, > > We discovered Direct Memory OOM after running program with Ratis for a > long time. Through heap dump, we find DirectByteBuffers holded by > ratis.thirdparty.netty are increasing. > > I discovered that in NettyServerStreamRPC.java, newChannelInitializer, > newDecoder, decodeDataStreamRequestByteBuf > static DataStreamRequestByteBuf decodeDataStreamRequestByteBuf(ByteBuf > buf) { > return Optional.ofNullable(decodeDataStreamRequestHeader(buf)) > .map(header -> checkHeader(header, buf)) > .map(header -> new DataStreamRequestByteBuf(header, decodeData(buf, > header, ByteBuf::retain))) > .orElse(null); > } > > retain() is called here, but seems no corresponding release(). Since > retain() will add buf's (which is DirectByteBuffer) referenced count, so > Cleaner cannot GC the buf and recycle its corresponding resources. I assume > that this might be the cause of direct memory OOM error, but I’m new to > Netty and not certain about it, so could anyone please take a look at it? > Thanks in advance. > > Best Wishes, > William Song > > > > > >
