Hi Tsz-Wo, We set up a cluster of IoTDB Datanodes, which consititude a Raft Group with 3 members, and have 3 clients writing data to these 3 servers respectively. We use gRPC as their underlying communication channel. After 48h of running, the 3 clients writes about 100GB data. Worth to notice, 1 server is particularly slow and is about 2000 logs behind. In this slow server we discovered the direct memory OOM error. This happens occasionally and is not deterministic.
William > 2022年7月19日 00:51,Tsz Wo Sze <[email protected]> 写道: > > Hi William, > > It does look like a leak. Could you provide the steps for reproducing it? > > Tsz-Wo > > > On Mon, Jul 18, 2022 at 8:41 AM William Song <[email protected] > <mailto:[email protected]>> wrote: > Hi, > > We discovered an error log from > org.apache.ratis.thirdparty.io.netty.utils.ResourceLeakDetector saying > ByteBuf.release() is not called before it’s garbage-collected. The following > is the error log screenshot. We encountered direct memory OOM several times > when running Ratis for a long time, so we assume this message may have > something to do with the direct memory OOM problem. > > Could anyone please take a look and check wether there is a memory leak? > Thanks in advance! > > Best Wishes, > William
