andrekramer1 opened a new issue #5896: Out of Memory with many consumers URL: https://github.com/apache/pulsar/issues/5896 Load testing on one standalone Pulsar with 4 producers and many consumers (1000s) we are getting out of memory errors - failing to allocate direct memory. Increasing memory has not helped with enough producers, Pulsar is failing with exceptions like the following when there are many consumers: ------------------------------------------------------------------------------------------------------- [pulsar-client-io-1-1] WARN org.apache.pulsar.client.impl.ClientCnx - [minnow.apama.com/10.42.100.81:6650] Got exception OutOfMemoryError : GC overhead limit exceeded java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.pulsar.shade.io.netty.util.Recycler$Stack.newHandle(Recycler.java:658) at org.apache.pulsar.shade.io.netty.util.Recycler.get(Recycler.java:163) at org.apache.pulsar.common.api.proto.PulsarApi$MessageIdData$Builder.buildPartial(PulsarApi.java:1339) at org.apache.pulsar.common.api.proto.PulsarApi$CommandMessage$Builder.mergeFrom(PulsarApi.java:15892) at org.apache.pulsar.common.api.proto.PulsarApi$CommandMessage$Builder.mergeFrom(PulsarApi.java:15745) at org.apache.pulsar.common.util.protobuf.ByteBufCodedInputStream.readMessage(ByteBufCodedInputStream.java:124) at org.apache.pulsar.common.api.proto.PulsarApi$BaseCommand$Builder.mergeFrom(PulsarApi.java:28839) at org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:83) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at org.apache.pulsar.shade.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) at org.apache.pulsar.shade.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310) at org.apache.pulsar.shade.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:426) at org.apache.pulsar.shade.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at org.apache.pulsar.shade.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at org.apache.pulsar.shade.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) at org.apache.pulsar.shade.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:799) at org.apache.pulsar.shade.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$1.run(AbstractEpollChannel.java:382) at org.apache.pulsar.shade.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) at org.apache.pulsar.shade.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) at org.apache.pulsar.shade.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:335) at org.apache.pulsar.shade.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) at org.apache.pulsar.shade.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:745) ------------------------------------------------------------------------------------------------------- Looking into a memory dump showed many buffers allocated to nio/netty. This issue could be related to 5751, 5720. We were load testing on a single Linux host with plenty of physical memory. Possibly related to issues 5513, 4196, 4632, we've also seen one crash due to Direct memory error but that seemed related to Bookkeeper processing: ------------------------------------------------------------------------------------------------------- 9:50:46.759 [BookieShutdownTrigger] ERROR org.apache.bookkeeper.bookie.BookieThread - Uncaught exception in thread BookieShutdownTrigger io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 134217728, max: 134217728) at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:655) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final] at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:610) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final] at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:769) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final] at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:745) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final] at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:244) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final] at io.netty.buffer.PoolArena.allocate(PoolArena.java:226) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final] at io.netty.buffer.PoolArena.allocate(PoolArena.java:146) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final] at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:324) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final] at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:185) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final] at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:176) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final] at io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:113) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final] at org.apache.bookkeeper.bookie.EntryLogger$BufferedLogChannel.appendLedgersMap(EntryLogger.java:145) ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2] at org.apache.bookkeeper.bookie.EntryLogManagerBase.createNewLog(EntryLogManagerBase.java:159) ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2] at org.apache.bookkeeper.bookie.EntryLogManagerForSingleEntryLog.getCurrentLogForLedgerForAddEntry(EntryLogManagerForSingleEntryLog.java:106) ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2] at org.apache.bookkeeper.bookie.EntryLogManagerBase.addEntry(EntryLogManagerBase.java:72) ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2] at org.apache.bookkeeper.bookie.EntryLogManagerForSingleEntryLog.addEntry(EntryLogManagerForSingleEntryLog.java:87) ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2] at org.apache.bookkeeper.bookie.EntryLogger.addEntry(EntryLogger.java:619) ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2] at org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.lambda$checkpoint$6(SingleDirectoryDbLedgerStorage.java:597) ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2] at org.apache.bookkeeper.bookie.storage.ldb.WriteCache.forEach(WriteCache.java:268) ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2] at org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.checkpoint(SingleDirectoryDbLedgerStorage.java:595) ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2] at org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.flush(SingleDirectoryDbLedgerStorage.java:686) ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2] at org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.shutdown(SingleDirectoryDbLedgerStorage.java:221) ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2] at org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage.shutdown(DbLedgerStorage.java:161) ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2] at org.apache.bookkeeper.bookie.Bookie.shutdown(Bookie.java:1172) ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2] at org.apache.bookkeeper.bookie.Bookie$6.run(Bookie.java:1132) ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2] 250.253: Total time for which application threads were stopped: 0.0002466 seconds, Stopping threads took: 0.0000837 seconds 09:50:46.839 [BookieDeathWatcher-3181] INFO org.apache.bookkeeper.proto.BookieServer - BookieDeathWatcher noticed the bookie is not running any more, exiting the watch loop! 250.325: Total time for which application threads were stopped: 0.0013974 seconds, Stopping threads took: 0.0011630 seconds 09:50:46.842 [component-shutdown-thread] INFO org.apache.bookkeeper.common.component.ComponentStarter - Closing component bookie-server in shutdown hook. 09:50:46.846 [component-shutdown-thread] INFO org.apache.bookkeeper.proto.BookieServer - Shutting down BookieServer
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
