Hi Tsz-Wo, It indeed looks like the same problem. I’ll add netty.leakDetectionLevel=paranoid to see if I can obtain more information.
William > 2022年7月21日 01:13,Tsz Wo Sze <[email protected]> 写道: > > Hi William, > > Indeed, there is a recent gRPC "ByteBuffer memory leak in retry mechanism" > issue; see https://github.com/grpc/grpc-java/issues/9340 . Not sure if it > is the same problem you saw. > > Tsz-Wo > > > On Tue, Jul 19, 2022 at 6:13 PM Tsz Wo Sze <[email protected]> wrote: > >> Hi William, >> >>> ... We use gRPC as their underlying communication channel. ... >> >> I searched the source code of IoTDB. IoTDB uses neither the Ratis >> Streaming API nor anything in org.apache.ratis.thirdparty.io.netty. >> Therefore, the leak seems to be from the gRPC library. >> >> Tsz-Wo >> >> >> On Tue, Jul 19, 2022 at 1:22 AM William Song <[email protected]> wrote: >> >>> Hi Tsz-Wo, >>> >>> We set up a cluster of IoTDB Datanodes, which consititude a Raft Group >>> with 3 members, and have 3 clients writing data to these 3 servers >>> respectively. We use gRPC as their underlying communication channel. After >>> 48h of running, the 3 clients writes about 100GB data. Worth to notice, 1 >>> server is particularly slow and is about 2000 logs behind. In this slow >>> server we discovered the direct memory OOM error. This happens occasionally >>> and is not deterministic. >>> >>> William >>> >>> >>> >>>> 2022年7月19日 00:51,Tsz Wo Sze <[email protected]> 写道: >>>> >>>> Hi William, >>>> >>>> It does look like a leak. Could you provide the steps for reproducing >>> it? >>>> >>>> Tsz-Wo >>>> >>>> >>>> On Mon, Jul 18, 2022 at 8:41 AM William Song <[email protected] >>> <mailto:[email protected]>> wrote: >>>> Hi, >>>> >>>> We discovered an error log from >>> org.apache.ratis.thirdparty.io.netty.utils.ResourceLeakDetector saying >>> ByteBuf.release() is not called before it’s garbage-collected. The >>> following is the error log screenshot. We encountered direct memory OOM >>> several times when running Ratis for a long time, so we assume this message >>> may have something to do with the direct memory OOM problem. >>>> >>>> Could anyone please take a look and check wether there is a memory >>> leak? Thanks in advance! >>>> >>>> Best Wishes, >>>> William >>> >>>
