andrekramer1 opened a new issue #5896: Out of Memory with many consumers
URL: https://github.com/apache/pulsar/issues/5896
 
 
   Load testing on one standalone Pulsar with 4 producers and many consumers 
(1000s) we are getting out of memory errors - failing to allocate direct 
memory. Increasing memory has not helped with enough producers, Pulsar is 
failing with exceptions like the following when there are many consumers:
   
-------------------------------------------------------------------------------------------------------
   [pulsar-client-io-1-1] WARN org.apache.pulsar.client.impl.ClientCnx - 
[minnow.apama.com/10.42.100.81:6650] Got exception OutOfMemoryError : GC 
overhead limit exceeded
   java.lang.OutOfMemoryError: GC overhead limit exceeded
           at 
org.apache.pulsar.shade.io.netty.util.Recycler$Stack.newHandle(Recycler.java:658)
           at 
org.apache.pulsar.shade.io.netty.util.Recycler.get(Recycler.java:163)
           at 
org.apache.pulsar.common.api.proto.PulsarApi$MessageIdData$Builder.buildPartial(PulsarApi.java:1339)
           at 
org.apache.pulsar.common.api.proto.PulsarApi$CommandMessage$Builder.mergeFrom(PulsarApi.java:15892)
           at 
org.apache.pulsar.common.api.proto.PulsarApi$CommandMessage$Builder.mergeFrom(PulsarApi.java:15745)
           at 
org.apache.pulsar.common.util.protobuf.ByteBufCodedInputStream.readMessage(ByteBufCodedInputStream.java:124)
           at 
org.apache.pulsar.common.api.proto.PulsarApi$BaseCommand$Builder.mergeFrom(PulsarApi.java:28839)
           at 
org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:83)
           at 
org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
           at 
org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
           at 
org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
           at 
org.apache.pulsar.shade.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323)
           at 
org.apache.pulsar.shade.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
           at 
org.apache.pulsar.shade.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:426)
           at 
org.apache.pulsar.shade.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278)
           at 
org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
           at 
org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
           at 
org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
           at 
org.apache.pulsar.shade.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
           at 
org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
           at 
org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
           at 
org.apache.pulsar.shade.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
           at 
org.apache.pulsar.shade.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:799)
           at 
org.apache.pulsar.shade.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$1.run(AbstractEpollChannel.java:382)
           at 
org.apache.pulsar.shade.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
           at 
org.apache.pulsar.shade.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
           at 
org.apache.pulsar.shade.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:335)
           at 
org.apache.pulsar.shade.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909)
           at 
org.apache.pulsar.shade.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
           at java.lang.Thread.run(Thread.java:745)
   
-------------------------------------------------------------------------------------------------------
   Looking into a memory dump showed many buffers allocated to nio/netty.
   
   This issue could be related to 5751, 5720.
   
   We were load testing on a single Linux host with plenty of physical memory.
   
   
   Possibly related to issues 5513, 4196, 4632, we've also seen one crash due 
to Direct memory error but that seemed related to Bookkeeper processing:
   
-------------------------------------------------------------------------------------------------------
   
   9:50:46.759 [BookieShutdownTrigger] ERROR 
org.apache.bookkeeper.bookie.BookieThread - Uncaught exception in thread 
BookieShutdownTrigger
   io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 
byte(s) of direct memory (used: 134217728, max: 134217728)
           at 
io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:655)
 ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
           at 
io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:610)
 ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
           at 
io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:769) 
~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
           at 
io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:745) 
~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
           at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:244) 
~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
           at io.netty.buffer.PoolArena.allocate(PoolArena.java:226) 
~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
           at io.netty.buffer.PoolArena.allocate(PoolArena.java:146) 
~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
           at 
io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:324)
 ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
           at 
io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:185)
 ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
           at 
io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:176)
 ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
           at 
io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:113)
 ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
           at 
org.apache.bookkeeper.bookie.EntryLogger$BufferedLogChannel.appendLedgersMap(EntryLogger.java:145)
 ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
           at 
org.apache.bookkeeper.bookie.EntryLogManagerBase.createNewLog(EntryLogManagerBase.java:159)
 ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
           at 
org.apache.bookkeeper.bookie.EntryLogManagerForSingleEntryLog.getCurrentLogForLedgerForAddEntry(EntryLogManagerForSingleEntryLog.java:106)
 ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
           at 
org.apache.bookkeeper.bookie.EntryLogManagerBase.addEntry(EntryLogManagerBase.java:72)
 ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
           at 
org.apache.bookkeeper.bookie.EntryLogManagerForSingleEntryLog.addEntry(EntryLogManagerForSingleEntryLog.java:87)
 ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
           at 
org.apache.bookkeeper.bookie.EntryLogger.addEntry(EntryLogger.java:619) 
~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
           at 
org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.lambda$checkpoint$6(SingleDirectoryDbLedgerStorage.java:597)
 ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
           at 
org.apache.bookkeeper.bookie.storage.ldb.WriteCache.forEach(WriteCache.java:268)
 ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
           at 
org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.checkpoint(SingleDirectoryDbLedgerStorage.java:595)
 ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
           at 
org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.flush(SingleDirectoryDbLedgerStorage.java:686)
 ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
           at 
org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.shutdown(SingleDirectoryDbLedgerStorage.java:221)
 ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
           at 
org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage.shutdown(DbLedgerStorage.java:161)
 ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
           at org.apache.bookkeeper.bookie.Bookie.shutdown(Bookie.java:1172) 
~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
           at org.apache.bookkeeper.bookie.Bookie$6.run(Bookie.java:1132) 
~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
   250.253: Total time for which application threads were stopped: 0.0002466 
seconds, Stopping threads took: 0.0000837 seconds
   09:50:46.839 [BookieDeathWatcher-3181] INFO  
org.apache.bookkeeper.proto.BookieServer - BookieDeathWatcher noticed the 
bookie is not running any more, exiting the watch loop!
   250.325: Total time for which application threads were stopped: 0.0013974 
seconds, Stopping threads took: 0.0011630 seconds
   09:50:46.842 [component-shutdown-thread] INFO  
org.apache.bookkeeper.common.component.ComponentStarter - Closing component 
bookie-server in shutdown hook.
   09:50:46.846 [component-shutdown-thread] INFO  
org.apache.bookkeeper.proto.BookieServer - Shutting down BookieServer

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to