Hi William,

Indeed, there is a recent gRPC "ByteBuffer memory leak in retry mechanism"
issue; see https://github.com/grpc/grpc-java/issues/9340 .  Not sure if it
is the same problem you saw.

Tsz-Wo


On Tue, Jul 19, 2022 at 6:13 PM Tsz Wo Sze <[email protected]> wrote:

> Hi William,
>
> > ... We use gRPC as their underlying communication channel. ...
>
> I searched the source code of IoTDB.  IoTDB uses neither the Ratis
> Streaming API nor anything in org.apache.ratis.thirdparty.io.netty.
> Therefore, the leak seems to be from the gRPC library.
>
> Tsz-Wo
>
>
> On Tue, Jul 19, 2022 at 1:22 AM William Song <[email protected]> wrote:
>
>> Hi Tsz-Wo,
>>
>> We set up a cluster of IoTDB Datanodes, which consititude a Raft Group
>> with 3 members, and have 3 clients writing data to these 3 servers
>> respectively.  We use gRPC as their underlying communication channel. After
>> 48h of running, the 3 clients writes about 100GB data. Worth to notice, 1
>> server is particularly slow and is about 2000 logs behind. In this slow
>> server we discovered the direct memory OOM error. This happens occasionally
>> and is not deterministic.
>>
>> William
>>
>>
>>
>> > 2022年7月19日 00:51,Tsz Wo Sze <[email protected]> 写道:
>> >
>> > Hi William,
>> >
>> > It does look like a leak.  Could you provide the steps for reproducing
>> it?
>> >
>> > Tsz-Wo
>> >
>> >
>> > On Mon, Jul 18, 2022 at 8:41 AM William Song <[email protected]
>> <mailto:[email protected]>> wrote:
>> > Hi,
>> >
>> > We discovered an error log from
>> org.apache.ratis.thirdparty.io.netty.utils.ResourceLeakDetector saying
>> ByteBuf.release() is not called before it’s garbage-collected. The
>> following is the error log screenshot. We encountered direct memory OOM
>> several times when running Ratis for a long time, so we assume this message
>> may have something to do with the direct memory OOM problem.
>> >
>> > Could anyone please take a look and check wether there is a memory
>> leak? Thanks in advance!
>> >
>> > Best Wishes,
>> > William
>>
>>

Reply via email to