[
https://issues.apache.org/jira/browse/HBASE-27112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Kyle Purtell resolved HBASE-27112.
-----------------------------------------
Fix Version/s: (was: 2.5.0)
(was: 3.0.0-alpha-4)
Assignee: (was: Andrew Kyle Purtell)
Resolution: Not A Problem
It looks like we don't need to do this now.
> Investigate Netty resource usage limits
> ---------------------------------------
>
> Key: HBASE-27112
> URL: https://issues.apache.org/jira/browse/HBASE-27112
> Project: HBase
> Issue Type: Sub-task
> Components: IPC/RPC
> Affects Versions: 2.5.0
> Reporter: Andrew Kyle Purtell
> Priority: Major
>
> We leave Netty level resource limits unbounded. The number of threads to use
> for the event loop is default 0 (unbounded). The default for
> io.netty.eventLoop.maxPendingTasks is INT_MAX.
> We don't do that for our own RPC handlers. We have a notion of maximum
> handler pool size, with a default of 30, typically raised in production by
> the user. We constrain the depth of the request queue in multiple ways...
> limits on number of queued calls, limits on total size of calls data that can
> be queued (to avoid memory usage overrun, CoDel conditioning of the call
> queues if it is enabled, and so on.
> Under load can we pile up a excess of pending request state, such as direct
> buffers containing request bytes, at the netty layer because of downstream
> resource limits? Those limits will act as a bottleneck, as intended, and
> before would have also applied backpressure through RPC too, because
> SimpleRpcServer had thread limits ("hbase.ipc.server.read.threadpool.size",
> default 10), but Netty may be able to queue up a lot more, in comparison,
> because Netty has been optimized to prefer concurrency.
> Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0
> (unbounded). I don't know what it can actually get up to in production,
> because we lack the metric, but there are diminishing returns when threads >
> cores so a reasonable default here could be
> Runtime.getRuntime().availableProcessors() instead of unbounded?
> maxPendingTasks probably should not be INT_MAX, but that may matter less.
> The tasks here are:
> - Instrument netty level resources to understand better actual resource
> allocations under load. Investigate what we need to plug in where to gain
> visibility.
> - Where instrumentation designed for this issue can be implemented as low
> overhead metrics, consider formally adding them as a metric.
> - Based on the findings from this instrumentation, consider and implement
> next steps. The goal would be to limit concurrency at the Netty layer in such
> a way that performance is still good, and under load we don't balloon
> resource usage at the Netty layer.
> If the instrumentation and experimental results indicate no changes are
> necessary, we can close this as Not A Problem or WontFix.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)