Hi William, According to this comment https://github.com/grpc/grpc-java/issues/9340#issuecomment-1185995690 , they will have a fix in 1.48.1 soon.
Tsz-Wo On Wed, Jul 20, 2022 at 7:43 PM William Song <[email protected]> wrote: > Hi Tsz-Wo, > > It indeed looks like the same problem. I’ll add > netty.leakDetectionLevel=paranoid to see if I can obtain more information. > > William > > > 2022年7月21日 01:13,Tsz Wo Sze <[email protected]> 写道: > > > > Hi William, > > > > Indeed, there is a recent gRPC "ByteBuffer memory leak in retry > mechanism" > > issue; see https://github.com/grpc/grpc-java/issues/9340 . Not sure if > it > > is the same problem you saw. > > > > Tsz-Wo > > > > > > On Tue, Jul 19, 2022 at 6:13 PM Tsz Wo Sze <[email protected]> wrote: > > > >> Hi William, > >> > >>> ... We use gRPC as their underlying communication channel. ... > >> > >> I searched the source code of IoTDB. IoTDB uses neither the Ratis > >> Streaming API nor anything in org.apache.ratis.thirdparty.io.netty. > >> Therefore, the leak seems to be from the gRPC library. > >> > >> Tsz-Wo > >> > >> > >> On Tue, Jul 19, 2022 at 1:22 AM William Song <[email protected]> > wrote: > >> > >>> Hi Tsz-Wo, > >>> > >>> We set up a cluster of IoTDB Datanodes, which consititude a Raft Group > >>> with 3 members, and have 3 clients writing data to these 3 servers > >>> respectively. We use gRPC as their underlying communication channel. > After > >>> 48h of running, the 3 clients writes about 100GB data. Worth to > notice, 1 > >>> server is particularly slow and is about 2000 logs behind. In this slow > >>> server we discovered the direct memory OOM error. This happens > occasionally > >>> and is not deterministic. > >>> > >>> William > >>> > >>> > >>> > >>>> 2022年7月19日 00:51,Tsz Wo Sze <[email protected]> 写道: > >>>> > >>>> Hi William, > >>>> > >>>> It does look like a leak. Could you provide the steps for reproducing > >>> it? > >>>> > >>>> Tsz-Wo > >>>> > >>>> > >>>> On Mon, Jul 18, 2022 at 8:41 AM William Song <[email protected] > >>> <mailto:[email protected]>> wrote: > >>>> Hi, > >>>> > >>>> We discovered an error log from > >>> org.apache.ratis.thirdparty.io.netty.utils.ResourceLeakDetector saying > >>> ByteBuf.release() is not called before it’s garbage-collected. The > >>> following is the error log screenshot. We encountered direct memory OOM > >>> several times when running Ratis for a long time, so we assume this > message > >>> may have something to do with the direct memory OOM problem. > >>>> > >>>> Could anyone please take a look and check wether there is a memory > >>> leak? Thanks in advance! > >>>> > >>>> Best Wishes, > >>>> William > >>> > >>> > >
