[
https://issues.apache.org/jira/browse/HBASE-27947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17737348#comment-17737348
]
Bryan Beaudreault commented on HBASE-27947:
-------------------------------------------
FYI I swapped to using netty-tcnative with BoringSSL, and still seeing lots of
OOMs but the stacktrace has changed. It now appears to mostly be originating in
the writeAndFlush callpath. Example:
{code:java}
org.apache.hbase.thirdparty.io.netty.util.internal.OutOfDirectMemoryError:
failed to allocate 4194304 byte(s) of direct memory (used: 3070234510, max:
3073741824)
at
org.apache.hbase.thirdparty.io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:845)
~[hbase-shaded-netty-4.1.4.jar:?]
at
org.apache.hbase.thirdparty.io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:774)
~[hbase-shaded-netty-4.1.4.jar:?]
at
org.apache.hbase.thirdparty.io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:701)
~[hbase-shaded-netty-4.1.4.jar:?]
at
org.apache.hbase.thirdparty.io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:676)
~[hbase-shaded-netty-4.1.4.jar:?]
at
org.apache.hbase.thirdparty.io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:215)
~[hbase-shaded-netty-4.1.4.jar:?]
at
org.apache.hbase.thirdparty.io.netty.buffer.PoolArena.tcacheAllocateSmall(PoolArena.java:180)
~[hbase-shaded-netty-4.1.4.jar:?]
at
org.apache.hbase.thirdparty.io.netty.buffer.PoolArena.allocate(PoolArena.java:137)
~[hbase-shaded-netty-4.1.4.jar:?]
at
org.apache.hbase.thirdparty.io.netty.buffer.PoolArena.allocate(PoolArena.java:129)
~[hbase-shaded-netty-4.1.4.jar:?]
at
org.apache.hbase.thirdparty.io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:396)
~[hbase-shaded-netty-4.1.4.jar:?]
at
org.apache.hbase.thirdparty.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:188)
~[hbase-shaded-netty-4.1.4.jar:?]
at
org.apache.hbase.thirdparty.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:179)
~[hbase-shaded-netty-4.1.4.jar:?]
at
org.apache.hbase.thirdparty.io.netty.handler.ssl.SslHandler$SslEngineType$1.allocateWrapBuffer(SslHandler.java:232)
~[hbase-shaded-netty-4.1.4.jar:?]
at
org.apache.hbase.thirdparty.io.netty.handler.ssl.SslHandler.allocateOutNetBuf(SslHandler.java:2266)
~[hbase-shaded-netty-4.1.4.jar:?]
at
org.apache.hbase.thirdparty.io.netty.handler.ssl.SslHandler.wrap(SslHandler.java:825)
~[hbase-shaded-netty-4.1.4.jar:?]
at
org.apache.hbase.thirdparty.io.netty.handler.ssl.SslHandler.wrapAndFlush(SslHandler.java:799)
~[hbase-shaded-netty-4.1.4.jar:?]
at
org.apache.hbase.thirdparty.io.netty.handler.ssl.SslHandler.flush(SslHandler.java:780)
~[hbase-shaded-netty-4.1.4.jar:?]
at
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:925)
~[hbase-shaded-netty-4.1.4.jar:?]
at
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:907)
~[hbase-shaded-netty-4.1.4.jar:?]
at
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:893)
~[hbase-shaded-netty-4.1.4.jar:?]
at
org.apache.hbase.thirdparty.io.netty.channel.ChannelOutboundHandlerAdapter.flush(ChannelOutboundHandlerAdapter.java:125)
~[hbase-shaded-netty-4.1.4.jar:?]
at
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:925)
~[hbase-shaded-netty-4.1.4.jar:?]
at
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:941)
~[hbase-shaded-netty-4.1.4.jar:?]
at
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1247)
~[hbase-shaded-netty-4.1.4.jar:?]
at
org.apache.hbase.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
~[hbase-shaded-netty-4.1.4.jar:?]
at
org.apache.hbase.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
~[hbase-shaded-netty-4.1.4.jar:?]
at
org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
~[hbase-shaded-netty-4.1.4.jar:?]
at
org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:403)
~[hbase-shaded-netty-4.1.4.jar:?]
at
org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
~[hbase-shaded-netty-4.1.4.jar:?]
at
org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
~[hbase-shaded-netty-4.1.4.jar:?]
at
org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
~[hbase-shaded-netty-4.1.4.jar:?]
at java.lang.Thread.run(Thread.java:833) ~[?:?] {code}
> RegionServer OOM under load when TLS is enabled
> -----------------------------------------------
>
> Key: HBASE-27947
> URL: https://issues.apache.org/jira/browse/HBASE-27947
> Project: HBase
> Issue Type: Bug
> Components: rpc
> Affects Versions: 2.6.0
> Reporter: Bryan Beaudreault
> Priority: Critical
>
> We are rolling out the server side TLS settings to all of our QA clusters.
> This has mostly gone fine, except on 1 cluster. Most clusters, including this
> one have a sampled {{nettyDirectMemory}} usage of about 30-100mb. This
> cluster tends to get bursts of traffic, in which case it would typically jump
> to 400-500mb. Again this is sampled, so it could have been higher than that.
> When we enabled SSL on this cluster, we started seeing bursts up to at least
> 4gb. This exceeded our {{{}-XX:MaxDirectMemorySize{}}}, which caused OOM's
> and general chaos on the cluster.
>
> We've gotten it under control a little bit by setting
> {{-Dorg.apache.hbase.thirdparty.io.netty.maxDirectMemory}} and
> {{{}-Dorg.apache.hbase.thirdparty.io.netty.tryReflectionSetAccessible{}}}.
> We've set netty's maxDirectMemory to be approx equal to
> ({{{}-XX:MaxDirectMemorySize - BucketCacheSize - ReservoirSize{}}}). Now we
> are seeing netty's own OutOfDirectMemoryError, which is still causing pain
> for clients but at least insulates the other components of the regionserver.
>
> We're still digging into exactly why this is happening. The cluster clearly
> has a bad access pattern, but it doesn't seem like SSL should increase the
> memory footprint by 5-10x like we're seeing.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)