[
https://issues.apache.org/jira/browse/CASSANDRA-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826879#comment-15826879
]
Tom van der Woerdt commented on CASSANDRA-13126:
------------------------------------------------
Apparently I do!
{code}
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_112]
ERROR [SharedPool-Worker-1] 2017-01-11 15:26:59,533 Message.java:617 -
Unexpected exception during request; channel = [id: 0xc259e8df, /1.2.3.4:45232
=> /5.6.7.8:9042]
io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct
buffer memory
at
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:153)
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at
io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.epollInReady(EpollSocketChannel.java:722)
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at
io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:326)
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:264)
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_112]
Caused by: java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:693) ~[na:1.8.0_112]
at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
~[na:1.8.0_112]
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) ~[na:1.8.0_112]
at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:434)
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179)
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.buffer.PoolArena.allocate(PoolArena.java:168)
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.buffer.PoolArena.reallocate(PoolArena.java:277)
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:108)
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:251)
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:849)
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:841)
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:831)
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:146)
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
... 9 common frames omitted
{code}
> native transport protocol corruption when using SSL
> ---------------------------------------------------
>
> Key: CASSANDRA-13126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13126
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Tom van der Woerdt
> Priority: Critical
>
> This is a series of conditions that can result in client connections becoming
> unusable.
> 1) Cassandra GC must be well-tuned, to have short GC pauses every minute or so
> 2) *client* SSL must be enabled and transmitting a significant amount of data
> 3) Cassandra must run with the default library versions
> 4) disableexplicitgc must be set (this is the default in the current
> cassandra-env.sh)
> This ticket relates to CASSANDRA-13114 which is a possible workaround (but
> not a fix) for the SSL requirement to trigger this bug.
> * Netty allocates nio.ByteBuffers for every outgoing SSL message.
> * ByteBuffers consist of two parts, the jvm object and the off-heap object.
> The jvm object is small and goes with regular GC cycles, the off-heap object
> gets freed only when the small jvm object is freed. To avoid exploding the
> native memory use, the jvm defaults to limiting its allocation to the max
> heap size. Allocating beyond that limit triggers a System.gc(), a retry, and
> potentially an exception.
> * System.gc is a no-op under disableexplicitgc
> * This means ByteBuffers are likely to throw an exception when too many
> objects are being allocated
> * The netty version shipped in Cassandra is broken when using SSL (see
> CASSANDRA-13114) and causes significantly too many bytebuffers to be
> allocated.
> This gets more complicated though.
> When /some/ clients use SSL, and others don't, the clients not using SSL can
> still be affected by this bug, as bytebuffer starvation caused by ssl will
> leak to other users.
> ByteBuffers are used very early on in the native protocol as well. Before
> even being able to decode the network protocol, this error can be thrown :
> {noformat}
> io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct
> buffer memory
> {noformat}
> Note that this comes back with stream_id 0, so clients end up waiting for the
> client timeout before the query is considered failed and retried.
> A few frames later on the same connection, this appears:
> {noformat}
> Provided frame does not appear to be Snappy compressed
> {noformat}
> And after that everything errors out with:
> {noformat}
> Invalid or unsupported protocol version (54); the lowest supported version is
> 3 and the greatest is 4
> {noformat}
> So this bug ultimately affects the binary protocol and the connection becomes
> useless if not downright dangerous.
> I think there are several things that need to be done here.
> * CASSANDRA-13114 should be fixed (easy, and probably needs to land in 3.0.11
> anyway)
> * Connections should be closed after a DecoderException
> * DisableExplicitGC should be removed from the default JVM arguments
> Any of these three would limit the impact to clients.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)