[
https://issues.apache.org/jira/browse/HBASE-27112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555293#comment-17555293
]
Viraj Jasani commented on HBASE-27112:
--------------------------------------
I think what [~zhangduo] mentioned on this comment might be worth a try,
wondering if this can be quickly tested (might need Duo's help):
{quote}Actually, when auth-int or auth-conf is used, we will copy the bytes
from netty's BB to on heap byte array, wrap or unwrap it, and then just
Unpool.wrappedBuffer to pass the on heap byte array to later handlers. In this
way, actually we can release netty's native byte buf earlier...
[https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/security/SaslWrapHandler.java]
[https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/security/SaslUnwrapHandler.java]
{quote}
> Investigate Netty resource usage limits
> ---------------------------------------
>
> Key: HBASE-27112
> URL: https://issues.apache.org/jira/browse/HBASE-27112
> Project: HBase
> Issue Type: Sub-task
> Components: IPC/RPC
> Affects Versions: 2.5.0
> Reporter: Andrew Kyle Purtell
> Assignee: Andrew Kyle Purtell
> Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-4
>
>
> We leave Netty level resource limits unbounded. The number of threads to use
> for the event loop is default 0 (unbounded). The default for
> io.netty.eventLoop.maxPendingTasks is INT_MAX.
> We don't do that for our own RPC handlers. We have a notion of maximum
> handler pool size, with a default of 30, typically raised in production by
> the user. We constrain the depth of the request queue in multiple ways...
> limits on number of queued calls, limits on total size of calls data that can
> be queued (to avoid memory usage overrun, CoDel conditioning of the call
> queues if it is enabled, and so on.
> Under load can we pile up a excess of pending request state, such as direct
> buffers containing request bytes, at the netty layer because of downstream
> resource limits? Those limits will act as a bottleneck, as intended, and
> before would have also applied backpressure through RPC too, because
> SimpleRpcServer had thread limits ("hbase.ipc.server.read.threadpool.size",
> default 10), but Netty may be able to queue up a lot more, in comparison,
> because Netty has been optimized to prefer concurrency.
> Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0
> (unbounded). I don't know what it can actually get up to in production,
> because we lack the metric, but there are diminishing returns when threads >
> cores so a reasonable default here could be
> Runtime.getRuntime().availableProcessors() instead of unbounded?
> maxPendingTasks probably should not be INT_MAX, but that may matter less.
> The tasks here are:
> - Instrument netty level resources to understand better actual resource
> allocations under load. Investigate what we need to plug in where to gain
> visibility.
> - Where instrumentation designed for this issue can be implemented as low
> overhead metrics, consider formally adding them as a metric.
> - Based on the findings from this instrumentation, consider and implement
> next steps. The goal would be to limit concurrency at the Netty layer in such
> a way that performance is still good, and under load we don't balloon
> resource usage at the Netty layer.
> If the instrumentation and experimental results indicate no changes are
> necessary, we can close this as Not A Problem or WontFix.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)