[
https://issues.apache.org/jira/browse/HBASE-28128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bryan Beaudreault resolved HBASE-28128.
---------------------------------------
Fix Version/s: 2.6.0
2.5.6
3.0.0-beta-1
Assignee: Bryan Beaudreault
Resolution: Fixed
> Reject requests at RPC layer when RegionServer is aborting
> ----------------------------------------------------------
>
> Key: HBASE-28128
> URL: https://issues.apache.org/jira/browse/HBASE-28128
> Project: HBase
> Issue Type: Improvement
> Reporter: Bryan Beaudreault
> Assignee: Bryan Beaudreault
> Priority: Major
> Fix For: 2.6.0, 2.5.6, 3.0.0-beta-1
>
>
> We recently had an operational incident where the RegionServer got aborted,
> but failed to exit within a reasonable timeframe. We're going to tune
> hbase.regionserver.abort.timeout much lower than the 20m default, but even
> with that it makes little sense to accept requests when the server is
> aborting.
> In our case, the server was impaired and not processing requests. The call
> queue was full, so NettyRpcServer kept trying and failing to add requests to
> the queue. This results in CallQueueTooBigException, which is not a meta
> cache clearing exception. It continued throwing these exceptions for multiple
> minutes until we finally manually killed the server.
> I'd like to add a check in ServerRpcConnection.processRequest, where we check
> if regionServer.isAborted() and throw a RegionServerAbortedException rather
> than attempt to enqueue the request.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)