Hi Tsz-Wo,

It indeed looks like the same problem. I’ll add 
netty.leakDetectionLevel=paranoid to see if I can obtain more information.

William

> 2022年7月21日 01:13,Tsz Wo Sze <[email protected]> 写道:
> 
> Hi William,
> 
> Indeed, there is a recent gRPC "ByteBuffer memory leak in retry mechanism"
> issue; see https://github.com/grpc/grpc-java/issues/9340 .  Not sure if it
> is the same problem you saw.
> 
> Tsz-Wo
> 
> 
> On Tue, Jul 19, 2022 at 6:13 PM Tsz Wo Sze <[email protected]> wrote:
> 
>> Hi William,
>> 
>>> ... We use gRPC as their underlying communication channel. ...
>> 
>> I searched the source code of IoTDB.  IoTDB uses neither the Ratis
>> Streaming API nor anything in org.apache.ratis.thirdparty.io.netty.
>> Therefore, the leak seems to be from the gRPC library.
>> 
>> Tsz-Wo
>> 
>> 
>> On Tue, Jul 19, 2022 at 1:22 AM William Song <[email protected]> wrote:
>> 
>>> Hi Tsz-Wo,
>>> 
>>> We set up a cluster of IoTDB Datanodes, which consititude a Raft Group
>>> with 3 members, and have 3 clients writing data to these 3 servers
>>> respectively.  We use gRPC as their underlying communication channel. After
>>> 48h of running, the 3 clients writes about 100GB data. Worth to notice, 1
>>> server is particularly slow and is about 2000 logs behind. In this slow
>>> server we discovered the direct memory OOM error. This happens occasionally
>>> and is not deterministic.
>>> 
>>> William
>>> 
>>> 
>>> 
>>>> 2022年7月19日 00:51,Tsz Wo Sze <[email protected]> 写道:
>>>> 
>>>> Hi William,
>>>> 
>>>> It does look like a leak.  Could you provide the steps for reproducing
>>> it?
>>>> 
>>>> Tsz-Wo
>>>> 
>>>> 
>>>> On Mon, Jul 18, 2022 at 8:41 AM William Song <[email protected]
>>> <mailto:[email protected]>> wrote:
>>>> Hi,
>>>> 
>>>> We discovered an error log from
>>> org.apache.ratis.thirdparty.io.netty.utils.ResourceLeakDetector saying
>>> ByteBuf.release() is not called before it’s garbage-collected. The
>>> following is the error log screenshot. We encountered direct memory OOM
>>> several times when running Ratis for a long time, so we assume this message
>>> may have something to do with the direct memory OOM problem.
>>>> 
>>>> Could anyone please take a look and check wether there is a memory
>>> leak? Thanks in advance!
>>>> 
>>>> Best Wishes,
>>>> William
>>> 
>>> 

Reply via email to