Bryan Beaudreault created HBASE-27947:
-----------------------------------------

             Summary: RegionServer OOM under load when TLS is enabled
                 Key: HBASE-27947
                 URL: https://issues.apache.org/jira/browse/HBASE-27947
             Project: HBase
          Issue Type: Bug
    Affects Versions: 2.6.0
            Reporter: Bryan Beaudreault


We are rolling out the server side TLS settings to all of our QA clusters. This 
has mostly gone fine, except on 1 cluster. Most clusters, including this one 
have a sampled {{nettyDirectMemory}} usage of about 30-100mb. This cluster 
tends to get bursts of traffic, in which case it would typically jump to 
400-500mb. Again this is sampled, so it could have been higher than that. When 
we enabled SSL on this cluster, we started seeing bursts up to at least 4gb. 
This exceeded our {{{}-XX:MaxDirectMemorySize{}}}, which caused OOM's and 
general chaos on the cluster.
 
We've gotten it under control a little bit by setting 
{{-Dorg.apache.hbase.thirdparty.io.netty.maxDirectMemory}} and 
{{{}-Dorg.apache.hbase.thirdparty.io.netty.tryReflectionSetAccessible{}}}. 
We've set netty's maxDirectMemory to be approx equal to 
({{{}-XX:MaxDirectMemorySize - BucketCacheSize - ReservoirSize{}}}). Now we are 
seeing netty's own OutOfDirectMemoryError, which is still causing pain for 
clients but at least insulates the other components of the regionserver.
 
We're still digging into exactly why this is happening. The cluster clearly has 
a bad access pattern, but it doesn't seem like SSL should increase the memory 
footprint by 5-10x like we're seeing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to