Hi William, Indeed, there is a recent gRPC "ByteBuffer memory leak in retry mechanism" issue; see https://github.com/grpc/grpc-java/issues/9340 . Not sure if it is the same problem you saw.
Tsz-Wo On Tue, Jul 19, 2022 at 6:13 PM Tsz Wo Sze <[email protected]> wrote: > Hi William, > > > ... We use gRPC as their underlying communication channel. ... > > I searched the source code of IoTDB. IoTDB uses neither the Ratis > Streaming API nor anything in org.apache.ratis.thirdparty.io.netty. > Therefore, the leak seems to be from the gRPC library. > > Tsz-Wo > > > On Tue, Jul 19, 2022 at 1:22 AM William Song <[email protected]> wrote: > >> Hi Tsz-Wo, >> >> We set up a cluster of IoTDB Datanodes, which consititude a Raft Group >> with 3 members, and have 3 clients writing data to these 3 servers >> respectively. We use gRPC as their underlying communication channel. After >> 48h of running, the 3 clients writes about 100GB data. Worth to notice, 1 >> server is particularly slow and is about 2000 logs behind. In this slow >> server we discovered the direct memory OOM error. This happens occasionally >> and is not deterministic. >> >> William >> >> >> >> > 2022年7月19日 00:51,Tsz Wo Sze <[email protected]> 写道: >> > >> > Hi William, >> > >> > It does look like a leak. Could you provide the steps for reproducing >> it? >> > >> > Tsz-Wo >> > >> > >> > On Mon, Jul 18, 2022 at 8:41 AM William Song <[email protected] >> <mailto:[email protected]>> wrote: >> > Hi, >> > >> > We discovered an error log from >> org.apache.ratis.thirdparty.io.netty.utils.ResourceLeakDetector saying >> ByteBuf.release() is not called before it’s garbage-collected. The >> following is the error log screenshot. We encountered direct memory OOM >> several times when running Ratis for a long time, so we assume this message >> may have something to do with the direct memory OOM problem. >> > >> > Could anyone please take a look and check wether there is a memory >> leak? Thanks in advance! >> > >> > Best Wishes, >> > William >> >>
