[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners

Jonathan Lawlor (JIRA) Thu, 05 Mar 2015 10:35:57 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349215#comment-14349215
 ]


Jonathan Lawlor commented on HBASE-13090:
-----------------------------------------

With HBASE-11544 now in, I was thinking of tackling this one next and was 
looking for some feedback on the thought process:

Implementing the timeout server side would involve changes at three different 
levels: 
* RSRpcServers
* RegionScannerImpl/ReversedRegionScannerImpl
* StoreScanner 

The RSRpcServices could maintain a variable; something along the lines of 
remainingScanTime. This value could be initialized to be some fraction of the 
scanner timeout (maybe half would be good enough?). On each call to 
RegionScanner#nextRaw, RSRpcServices would communicate that the RegionScanner 
can take at most remainingScanTime to retrieve a Result -- if a Result cannot 
be formed in that time, a timeout occurs. The RegionScanner would communicate 
this same remainingScanTime to the StoreScanner so that calls to 
InternalScanner#next() may also timeout if they are taking too long. 

Note that if partial Results are NOT supported by the scan configuration (as is 
the case for small scans, and scans with a filter that requires whole rows to 
be read before a filtering decision can be made) then the timeout would not be 
enforceable within StoreScanner but only within RegionScannerImpl and 
RSRpcServices. This means that it would still be possible to timeout due to a 
single long running StoreScanner#next() call in the event that partial Results 
are not supported.

If a timeout does occur on the server, we would have to decide how this should 
be communicated back to the Client. I was thinking it would be most appropriate 
to communicate this back to the client via fields in the ScanResponse rather 
than flags on the Results in the ScanResponse (there is already a lot of state 
information implied through the contents of the Results in the ScanResponse and 
adding more seems like it would complicate things). Something along the lines 
of a timeoutOccurred boolean flag may be sufficient. Then on the Client side we 
could decide if enough Results were accumulated prior to the timeout to service 
the application request or if we must make another RPC to enough Results.

If anyone else has been thinking about how to approach the solution to this 
issue or has any other ideas please chime in. Any feedback would be much 
appreciated.

> Progress heartbeats for long running scanners
> ---------------------------------------------
>
>                 Key: HBASE-13090
>                 URL: https://issues.apache.org/jira/browse/HBASE-13090
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Andrew Purtell
>
> It can be necessary to set very long timeouts for clients that issue scans 
> over large regions when all data in the region might be filtered out 
> depending on scan criteria. This is a usability concern because it can be 
> hard to identify what worst case timeout to use until scans are 
> occasionally/intermittently failing in production, depending on variable scan 
> criteria. It would be better if the client-server scan protocol can send back 
> periodic progress heartbeats to clients as long as server scanners are alive 
> and making progress.
> This is related but orthogonal to streaming scan (HBASE-13071). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners

Reply via email to