[jira] [Commented] (HBASE-27947) RegionServer OOM under load when TLS is enabled

Bryan Beaudreault (Jira) Fri, 23 Jun 2023 05:31:10 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-27947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17736481#comment-17736481
 ]


Bryan Beaudreault commented on HBASE-27947:
-------------------------------------------

Here's an example stacktrace:
{code:java}
org.apache.hbase.thirdparty.io.netty.util.internal.OutOfDirectMemoryError: 
failed to allocate 4194304 byte(s) of direct memory (used: 3070248230, max: 
3073741824)
        at 
org.apache.hbase.thirdparty.io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:845)
 ~[hbase-shaded-netty-4.1.4.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:774)
 ~[hbase-shaded-netty-4.1.4.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:701)
 ~[hbase-shaded-netty-4.1.4.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:676)
 ~[hbase-shaded-netty-4.1.4.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:215)
 ~[hbase-shaded-netty-4.1.4.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.buffer.PoolArena.tcacheAllocateNormal(PoolArena.java:197)
 ~[hbase-shaded-netty-4.1.4.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.buffer.PoolArena.allocate(PoolArena.java:139)
 ~[hbase-shaded-netty-4.1.4.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.buffer.PoolArena.allocate(PoolArena.java:129)
 ~[hbase-shaded-netty-4.1.4.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:396)
 ~[hbase-shaded-netty-4.1.4.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:188)
 ~[hbase-shaded-netty-4.1.4.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:179)
 ~[hbase-shaded-netty-4.1.4.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.unix.PreferredDirectByteBufAllocator.ioBuffer(PreferredDirectByteBufAllocator.java:53)
 ~[hbase-shaded-netty-4.1.4.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:120)
 ~[hbase-shaded-netty-4.1.4.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollRecvByteAllocatorHandle.allocate(EpollRecvByteAllocatorHandle.java:75)
 ~[hbase-shaded-netty-4.1.4.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:785)
 ~[hbase-shaded-netty-4.1.4.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:499)
 ~[hbase-shaded-netty-4.1.4.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:397)
 ~[hbase-shaded-netty-4.1.4.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
 ~[hbase-shaded-netty-4.1.4.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
 ~[hbase-shaded-netty-4.1.4.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
 ~[hbase-shaded-netty-4.1.4.jar:?]
        at java.lang.Thread.run(Thread.java:833) ~[?:?] {code}

> RegionServer OOM under load when TLS is enabled
> -----------------------------------------------
>
>                 Key: HBASE-27947
>                 URL: https://issues.apache.org/jira/browse/HBASE-27947
>             Project: HBase
>          Issue Type: Bug
>          Components: rpc
>    Affects Versions: 2.6.0
>            Reporter: Bryan Beaudreault
>            Priority: Critical
>
> We are rolling out the server side TLS settings to all of our QA clusters. 
> This has mostly gone fine, except on 1 cluster. Most clusters, including this 
> one have a sampled {{nettyDirectMemory}} usage of about 30-100mb. This 
> cluster tends to get bursts of traffic, in which case it would typically jump 
> to 400-500mb. Again this is sampled, so it could have been higher than that. 
> When we enabled SSL on this cluster, we started seeing bursts up to at least 
> 4gb. This exceeded our {{{}-XX:MaxDirectMemorySize{}}}, which caused OOM's 
> and general chaos on the cluster.
>  
> We've gotten it under control a little bit by setting 
> {{-Dorg.apache.hbase.thirdparty.io.netty.maxDirectMemory}} and 
> {{{}-Dorg.apache.hbase.thirdparty.io.netty.tryReflectionSetAccessible{}}}. 
> We've set netty's maxDirectMemory to be approx equal to 
> ({{{}-XX:MaxDirectMemorySize - BucketCacheSize - ReservoirSize{}}}). Now we 
> are seeing netty's own OutOfDirectMemoryError, which is still causing pain 
> for clients but at least insulates the other components of the regionserver.
>  
> We're still digging into exactly why this is happening. The cluster clearly 
> has a bad access pattern, but it doesn't seem like SSL should increase the 
> memory footprint by 5-10x like we're seeing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HBASE-27947) RegionServer OOM under load when TLS is enabled

Reply via email to