YanshuoH opened a new issue #12716:
URL: https://github.com/apache/pulsar/issues/12716
Hi,
**Describe the bug**
Recently encountered a memory leak in Pulsar Broker. I've tried to identify
the issue with logs like below.
I've enabled `io.netty.leakDetectionLevel` up to `paranoid` and
`io.netty.leakDetection.targetRecords=40`, didn't find any leak messages
appear.
We have 4 pulsar clusters:
- version 2.7.3
- all running in Kubernetes 1.20. Only one cluster have encountered such
issue.
The only difference of the cluster in question is that I've enabled
Geo-Replication on all namespaces. So I disable it, but the memory leak remains.
```
14:44:01.497 [pulsar-io-23-2] ERROR org.apache.pulsar.PulsarBrokerStarter -
-- Shutting down - Received OOM exception: failed to allocate 16777216 byte(s)
of direct memory (used: 1073741824, max: 1073741824)
io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216
byte(s) of direct memory (used: 1073741824, max: 1073741824)
at
io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:776)
~[io.netty-netty-common-4.1.60.Final.jar:4.1.60.Final]
at
io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:731)
~[io.netty-netty-common-4.1.60.Final.jar:4.1.60.Final]
at
io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:645)
~[io.netty-netty-buffer-4.1.60.Final.jar:4.1.60.Final]
at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:621)
~[io.netty-netty-buffer-4.1.60.Final.jar:4.1.60.Final]
at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:204)
~[io.netty-netty-buffer-4.1.60.Final.jar:4.1.60.Final]
at io.netty.buffer.PoolArena.tcacheAllocateNormal(PoolArena.java:188)
~[io.netty-netty-buffer-4.1.60.Final.jar:4.1.60.Final]
at io.netty.buffer.PoolArena.allocate(PoolArena.java:138)
~[io.netty-netty-buffer-4.1.60.Final.jar:4.1.60.Final]
at io.netty.buffer.PoolArena.allocate(PoolArena.java:128)
~[io.netty-netty-buffer-4.1.60.Final.jar:4.1.60.Final]
at
io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:378)
~[io.netty-netty-buffer-4.1.60.Final.jar:4.1.60.Final]
at
io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:187)
~[io.netty-netty-buffer-4.1.60.Final.jar:4.1.60.Final]
at
org.apache.bookkeeper.common.allocator.impl.ByteBufAllocatorImpl.newDirectBuffer(ByteBufAllocatorImpl.java:164)
[org.apache.bookkeeper-bookkeeper-common-allocator-4.12.0.jar:4.12.0]
at
org.apache.bookkeeper.common.allocator.impl.ByteBufAllocatorImpl.newDirectBuffer(ByteBufAllocatorImpl.java:158)
[org.apache.bookkeeper-bookkeeper-common-allocator-4.12.0.jar:4.12.0]
at
io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:187)
[io.netty-netty-buffer-4.1.60.Final.jar:4.1.60.Final]
at
io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:178)
[io.netty-netty-buffer-4.1.60.Final.jar:4.1.60.Final]
at
io.netty.channel.unix.PreferredDirectByteBufAllocator.ioBuffer(PreferredDirectByteBufAllocator.java:53)
[io.netty-netty-transport-native-unix-common-4.1.60.Final-linux-x86_64.jar:4.1.60.Final]
at
io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:114)
[io.netty-netty-transport-4.1.60.Final.jar:4.1.60.Final]
at
io.netty.channel.epoll.EpollRecvByteAllocatorHandle.allocate(EpollRecvByteAllocatorHandle.java:75)
[io.netty-netty-transport-native-epoll-4.1.60.Final-linux-x86_64.jar:4.1.60.Final]
at
io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:780)
[io.netty-netty-transport-native-epoll-4.1.60.Final-linux-x86_64.jar:4.1.60.Final]
at
io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$1.run(AbstractEpollChannel.java:425)
[io.netty-netty-transport-native-epoll-4.1.60.Final-linux-x86_64.jar:4.1.60.Final]
at
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
[io.netty-netty-common-4.1.60.Final.jar:4.1.60.Final]
at
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
[io.netty-netty-common-4.1.60.Final.jar:4.1.60.Final]
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
[io.netty-netty-transport-native-epoll-4.1.60.Final-linux-x86_64.jar:4.1.60.Final]
at
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
[io.netty-netty-common-4.1.60.Final.jar:4.1.60.Final]
at
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
[io.netty-netty-common-4.1.60.Final.jar:4.1.60.Final]
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
[io.netty-netty-common-4.1.60.Final.jar:4.1.60.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_302]
```
**Screenshots**
It is only a testing cluster with not much memory configured. But I believe
the screenshots can show the leak.
<img width="866" alt="Screen Shot 2021-11-10 at 9 35 54 AM"
src="https://user-images.githubusercontent.com/6973092/141083424-9976b698-2f01-46cc-8afa-b2467b4e30f8.png">
<img width="848" alt="Screen Shot 2021-11-10 at 9 36 14 AM"
src="https://user-images.githubusercontent.com/6973092/141083464-0eda851b-cc76-4863-b24d-4b1a70634bf5.png">
<img width="864" alt="Screen Shot 2021-11-10 at 9 35 41 AM"
src="https://user-images.githubusercontent.com/6973092/141083380-a81eafef-9f4a-4021-b0c2-4414bd978f18.png">
<img width="861" alt="Screen Shot 2021-11-10 at 9 37 34 AM"
src="https://user-images.githubusercontent.com/6973092/141083479-aaf6399f-e97c-4c95-b042-6818bd93115f.png">
**Additional context**
I might take a wild guess that the memory leak may come from Bookie side
data loss.
I've found some error logs in it:
```
11:51:51.568 [BookieReadThreadPool-OrderedExecutor-3-0] INFO
org.apache.bookkeeper.proto.ReadEntryProcessorV3 - No ledger found while
reading entry: 0 from ledger: 242603
11:51:52.828 [bookie-io-1-1] INFO
org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id:
0x5d119911, L:/10.119.139.211:3181 ! R:/10.119.131.152:34376]
11:52:10.560 [bookie-io-1-2] INFO org.apache.bookkeeper.proto.AuthHandler -
Authentication success on server side
11:52:10.560 [bookie-io-1-2] INFO
org.apache.bookkeeper.proto.BookieRequestHandler - Channel connected [id:
0xb66b4346, L:/10.119.139.211:3181 - R:/10.119.164.211:54744]
11:52:11.149 [BookieReadThreadPool-OrderedExecutor-0-0] ERROR
org.apache.bookkeeper.proto.ReadLacProcessorV3 - No ledger found while
performing readLac from ledger: 242603
org.apache.bookkeeper.bookie.Bookie$NoLedgerException: Ledger 242603 not
found
at
org.apache.bookkeeper.bookie.LedgerDescriptor.createReadOnly(LedgerDescriptor.java:52)
~[org.apache.bookkeeper-bookkeeper-server-4.12.0.jar:4.12.0]
at
org.apache.bookkeeper.bookie.HandleFactoryImpl.getReadOnlyHandle(HandleFactoryImpl.java:61)
~[org.apache.bookkeeper-bookkeeper-server-4.12.0.jar:4.12.0]
at org.apache.bookkeeper.bookie.Bookie.getExplicitLac(Bookie.java:1364)
~[org.apache.bookkeeper-bookkeeper-server-4.12.0.jar:4.12.0]
at
org.apache.bookkeeper.proto.ReadLacProcessorV3.getReadLacResponse(ReadLacProcessorV3.java:71)
[org.apache.bookkeeper-bookkeeper-server-4.12.0.jar:4.12.0]
at
org.apache.bookkeeper.proto.ReadLacProcessorV3.safeRun(ReadLacProcessorV3.java:118)
[org.apache.bookkeeper-bookkeeper-server-4.12.0.jar:4.12.0]
at
org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36)
[org.apache.bookkeeper-bookkeeper-common-4.12.0.jar:4.12.0]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_302]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_302]
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
[io.netty-netty-common-4.1.60.Final.jar:4.1.60.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_302]
11:52:11.150 [BookieReadThreadPool-OrderedExecutor-0-0] ERROR
org.apache.bookkeeper.proto.ReadLacProcessorV3 - No ledger found while trying
to read last entry: 242603
org.apache.bookkeeper.bookie.Bookie$NoLedgerException: Ledger 242603 not
found
at
org.apache.bookkeeper.bookie.LedgerDescriptor.createReadOnly(LedgerDescriptor.java:52)
~[org.apache.bookkeeper-bookkeeper-server-4.12.0.jar:4.12.0]
at
org.apache.bookkeeper.bookie.HandleFactoryImpl.getReadOnlyHandle(HandleFactoryImpl.java:61)
~[org.apache.bookkeeper-bookkeeper-server-4.12.0.jar:4.12.0]
at org.apache.bookkeeper.bookie.Bookie.readEntry(Bookie.java:1441)
~[org.apache.bookkeeper-bookkeeper-server-4.12.0.jar:4.12.0]
at
org.apache.bookkeeper.proto.ReadLacProcessorV3.getReadLacResponse(ReadLacProcessorV3.java:86)
[org.apache.bookkeeper-bookkeeper-server-4.12.0.jar:4.12.0]
at
org.apache.bookkeeper.proto.ReadLacProcessorV3.safeRun(ReadLacProcessorV3.java:118)
[org.apache.bookkeeper-bookkeeper-server-4.12.0.jar:4.12.0]
at
org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36)
[org.apache.bookkeeper-bookkeeper-common-4.12.0.jar:4.12.0]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_302]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_302]
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
[io.netty-netty-common-4.1.60.Final.jar:4.1.60.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_302]
11:52:11.163 [BookieReadThreadPool-OrderedExecutor-3-0] INFO
org.apache.bookkeeper.proto.ReadEntryProcessorV3 - No ledger found while
reading entry: 0 from ledger: 242603
```
This is because we've encountered some Kubernetes Node issue long times ago
and some of the ledgers are unintentionally deleted.
And I can confirm that all the bookies are working just fine without any
memory leak issue (except the above exceptions)
Tell me if there's any clue or anything I can do to diagnostic the memory
leak.
Thanks.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]