[
https://issues.apache.org/jira/browse/HBASE-27149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bryan Beaudreault updated HBASE-27149:
--------------------------------------
Assignee: Bryan Beaudreault
Labels: patch-available (was: )
Status: Patch Available (was: Open)
Submitted https://github.com/apache/hbase/pull/4604
> Server should close scanner if client times out before results are ready
> ------------------------------------------------------------------------
>
> Key: HBASE-27149
> URL: https://issues.apache.org/jira/browse/HBASE-27149
> Project: HBase
> Issue Type: Improvement
> Reporter: Bryan Beaudreault
> Assignee: Bryan Beaudreault
> Priority: Major
> Labels: patch-available
>
> When heartbeats are enabled, we try to return from a scan before
> {{clientTimeout / 2}} millis has passed. Previously, this did not account for
> queue times so could still easily timeout. That problem was handled in
> HBASE-27048. It's still possible to timeout if heartbeats are disabled, we
> have queued for longer than \{{ clientTimeout / 2 }} millis and scan is slow,
> or if the RegionScanner is otherwise delayed in returning.
> How a scanner timeout is handled by the client depends on the point at which
> the timeout occurred:
> * In openScanner(), the call will be retried. We will not have received a
> scannerId, so cannot close that scanner that may have been open on the server
> side.
> * In next(), the timeout will bubble up and fail the scan. In this case we
> try to close the scanner, but that could be interrupted if server is
> overwhelmed (close call gets queued and then dropped) or client terminates.
> Active scanners carry with them a non-trivial amount of memory and resource
> overhead on the server. In my experience, if a server becomes overwhelmed,
> client scanners can start to time out. Those scanners live on on the server,
> contributing to memory and resource pressure. That further slows down the
> server, etc. This is especially problematic when openScanner times out
> because of the inherent retries of that call, with a single scanner possibly
> contributing multiple "leaked" scanners on the server before finally failing.
> We should attempt to close the scanner on the server side when a scan call
> takes longer than the client timeout to finish. I think this would be a
> matter of adding something like this to the end of RSRpcServices.scan:
> {code:java}
> if (EnvironmentEdgeManager.currentTime() > rpcCall.getDeadline()) {
> throw new TimeoutIOException("Client deadline exceeded, cannot return
> results");
> }{code}
> We already have a catch of IOException, wherein we close the scanner. The
> actual exception thrown shouldn't matter much since the client will not
> receive the response.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)