[
https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349215#comment-14349215
]
Jonathan Lawlor commented on HBASE-13090:
-----------------------------------------
With HBASE-11544 now in, I was thinking of tackling this one next and was
looking for some feedback on the thought process:
Implementing the timeout server side would involve changes at three different
levels:
* RSRpcServers
* RegionScannerImpl/ReversedRegionScannerImpl
* StoreScanner
The RSRpcServices could maintain a variable; something along the lines of
remainingScanTime. This value could be initialized to be some fraction of the
scanner timeout (maybe half would be good enough?). On each call to
RegionScanner#nextRaw, RSRpcServices would communicate that the RegionScanner
can take at most remainingScanTime to retrieve a Result -- if a Result cannot
be formed in that time, a timeout occurs. The RegionScanner would communicate
this same remainingScanTime to the StoreScanner so that calls to
InternalScanner#next() may also timeout if they are taking too long.
Note that if partial Results are NOT supported by the scan configuration (as is
the case for small scans, and scans with a filter that requires whole rows to
be read before a filtering decision can be made) then the timeout would not be
enforceable within StoreScanner but only within RegionScannerImpl and
RSRpcServices. This means that it would still be possible to timeout due to a
single long running StoreScanner#next() call in the event that partial Results
are not supported.
If a timeout does occur on the server, we would have to decide how this should
be communicated back to the Client. I was thinking it would be most appropriate
to communicate this back to the client via fields in the ScanResponse rather
than flags on the Results in the ScanResponse (there is already a lot of state
information implied through the contents of the Results in the ScanResponse and
adding more seems like it would complicate things). Something along the lines
of a timeoutOccurred boolean flag may be sufficient. Then on the Client side we
could decide if enough Results were accumulated prior to the timeout to service
the application request or if we must make another RPC to enough Results.
If anyone else has been thinking about how to approach the solution to this
issue or has any other ideas please chime in. Any feedback would be much
appreciated.
> Progress heartbeats for long running scanners
> ---------------------------------------------
>
> Key: HBASE-13090
> URL: https://issues.apache.org/jira/browse/HBASE-13090
> Project: HBase
> Issue Type: New Feature
> Reporter: Andrew Purtell
>
> It can be necessary to set very long timeouts for clients that issue scans
> over large regions when all data in the region might be filtered out
> depending on scan criteria. This is a usability concern because it can be
> hard to identify what worst case timeout to use until scans are
> occasionally/intermittently failing in production, depending on variable scan
> criteria. It would be better if the client-server scan protocol can send back
> periodic progress heartbeats to clients as long as server scanners are alive
> and making progress.
> This is related but orthogonal to streaming scan (HBASE-13071).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)