[ 
https://issues.apache.org/jira/browse/ARTEMIS-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136376#comment-17136376
 ] 

Bas commented on ARTEMIS-2696:
------------------------------

I'm not an expert on netty processing but the buffer is local and from the leak 
report above it looks like it is passed along to the handlers and in the 
stacktrace it passes along a writeandflush method call. So based on this 
specific case in the leak report it seems to me the buffer should be processed 
at the point the handlers have done pushBytes.

The patched version crashed after a day and a half, it could be related to some 
high load at that point. I restarted it on friday and it has been running fine 
since then, but no high load passed trough the server during that time.

I want to do some load tests and see if I can also look into your concerns.

> Netty DirectBufferLeak noticed in the log and server not processing messages 
> with out of direct memory errors after running for a day
> -------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ARTEMIS-2696
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-2696
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 2.11.0, 2.13.0
>         Environment: Ubuntu 18.04.1 LTS
> Netty 4.1.45
> Qpid 0.48
>  
>  
>            Reporter: Bas
>            Priority: Major
>
> Our servers started to crash daily once we were on the 2.11.0 version.
> The crashes were not killing the entire artemis server but will cause endless 
> log statements of the following:
> {noformat}
> Transport failed: io.netty.util.internal.OutOfDirectMemoryError: failed to 
> allocate 16777216 byte(s) of direct memory (used: 637534215, max: 652738560)
> {noformat}
> We launched an investigation and activated the netty advanced leak detection: 
> {{-Dio.netty.leakDetectionLevel=advanced}}
> After which we indeed noticed leak detection log entries with following 
> information:
> #1:
> {noformat}
> io.netty.buffer.AdvancedLeakAwareByteBuf.readBytes(AdvancedLeakAwareByteBuf.java:484)
>  
> org.apache.activemq.artemis.core.server.protocol.websocket.WebSocketFrameEncoder.writeContinuationFrame(WebSocketFrameEncoder.java:56)
>  
> org.apache.activemq.artemis.core.server.protocol.websocket.WebSocketFrameEncoder.write(WebSocketFrameEncoder.java:45)
>  
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:715)
>  
> io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:762)
>  
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:788)
>  
> io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:756)
>  
> io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1020)
>  
> io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:299) 
> org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnection.write(NettyConnection.java:382)
>  
> org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnection.write(NettyConnection.java:286)
>  
> org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnection.write(NettyConnection.java:281)
>  
> org.apache.activemq.artemis.protocol.amqp.broker.AMQPConnectionCallback.onTransport(AMQPConnectionCallback.java:203)
>  
> org.apache.activemq.artemis.protocol.amqp.proton.AMQPConnectionContext.pushBytes(AMQPConnectionContext.java:384)
>  
> org.apache.activemq.artemis.protocol.amqp.proton.handler.ProtonHandler.actualFlush(ProtonHandler.java:210)
>  
> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
>  
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
>  
> io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>  
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) 
> org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
> {noformat}
>  
> #2: Hint: 'websocket-frame-encoder' will handle the message from this point.
> {noformat}
> io.netty.channel.DefaultChannelPipeline.touch(DefaultChannelPipeline.java:116)
>  
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:784)
>  
> io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:756)
>  
> io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1020)
>  
> io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:299) 
> org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnection.write(NettyConnection.java:382)
>  
> org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnection.write(NettyConnection.java:286)
>  
> org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnection.write(NettyConnection.java:281)
>  
> org.apache.activemq.artemis.protocol.amqp.broker.AMQPConnectionCallback.onTransport(AMQPConnectionCallback.java:203)
>  
> org.apache.activemq.artemis.protocol.amqp.proton.AMQPConnectionContext.pushBytes(AMQPConnectionContext.java:384)
>  
> org.apache.activemq.artemis.protocol.amqp.proton.handler.ProtonHandler.actualFlush(ProtonHandler.java:210)
>  
> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
>  
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
>  
> io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>  
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) 
> org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
> {noformat}
> #3:
> {noformat}
> io.netty.buffer.AdvancedLeakAwareByteBuf.writeBytes(AdvancedLeakAwareByteBuf.java:622)
>  
> org.apache.activemq.artemis.protocol.amqp.proton.handler.ProtonHandler.actualFlush(ProtonHandler.java:207)
>  
> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
>  
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
>  
> io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>  
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) 
> org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
>  
> Created at: 
> io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:349)
>  
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:187)
>  
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:178)
>  
> org.apache.activemq.artemis.protocol.amqp.proton.handler.ProtonHandler.actualFlush(ProtonHandler.java:206)
>  
> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
>  
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
>  
> io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>  
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) 
> org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
> {noformat}
>  
> We looked into the source code of some classes in the above stack traces to 
> look for changes where a {{directbuffer.release()}} would be missing but did 
> not find any. It is also though to grasp what is happening in these classes 
> because I really need to look into the concepts here if I want to understand 
> what is happening and what needs to happen. Which will take a lot of time. 
> But something got my attention where I might be totally wrong in my 
> estimation as being related to the issue.
>   
>  The only thing I have on this is the following information has a direct 
> relation with a class in the leak report from netty: 
> [https://github.com/apache/activemq-artemis/blob/master/artemis-server/src/main/java/org/apache/activemq/artemis/core/server/protocol/websocket/WebSocketFrameEncoder.java]
> This class is added in 
> [this|https://github.com/apache/activemq-artemis/commit/9fac4b866cf9cefbb6f7c13b820e07455b6649f5]
>  commit doing something with a websocket continuation frame. This commit is 
> not in 2.10:
> {noformat}
> $ git tag --contains 9fac4b8
>  2.11.0
> {noformat}
> So, we wanted to test if doing a rollback to 2.10 would fix the issue. The 
> 2.10 version is now running for 3 days without having issues while with 2.11 
> it would have crashed with out of memory errors 3 times already.
> The issue could be in a different commit/location than what is analysed above 
> but 2.10 seems to work without issues.
> We also wiped our entine journal before going to 2.10 so I want to test the 
> 2.11 again with a clean journal and see if wiping the data store could have 
> caused 2.10 to function better. So more information will follow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to