[ 
https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lawlor updated HBASE-13090:
------------------------------------
    Attachment: HBASE-13090-v1.patch

Attached is a rough work in progress patch. The patch does provide a working 
implementation of heartbeats but I believe it could be refined so I am looking 
to get some feedback.

The implementation points that I wanted to highlight for discussion are below:
* We wanted to move all time tracking into RegionScanner and StoreScanner and 
leave RSRpcServices unscathed. I started off with that intention but it was 
slowly revealed that it may be better to simply have a timeLimit field in the 
call to nextRaw from RSRpcServices. Logic outlined below:
** While it is certainly possible to add a reset() or newSession() method to 
the RegionScanner interface that would allow us to reset time tracking, the 
issue becomes how do we communicate that size limit down from the RegionScanner 
into the StoreScanner (the scanner that is looping through the cells for a 
particular column family). 
** The StoreScanners are stored in a KeyValueHeap in the RegionScanner... So it 
would be possible to loop through them all and call a similar reset/newSession 
method on all of them but that seems dirty and wasteful. It seems more 
appropriate to communicate the timeLimit down to only the relevant storeScanner 
via a timeLimit field in the InternalScanner#next(List<Cell> results, ..., 
timeLimit) call.
** Since the RegionScanner also implements the InternalScanner interface, that 
same next method would need to be implemented in RegionScannerImpl. Because of 
this, I think it makes the most sense to simply have a nextRaw(List<Cell>, ..., 
timeLimit) method to specify the timeLimit from RSRpcServices rather than an 
update/newSession call
* To avoid polluting the returned Result array with state information about 
heartbeats, a new heartbeat flag has been added to the ScanResponse. Since only 
the ScannerCallable ever sees the ScanResponse returned from the server, I have 
exposed the method ScannerCallable#isHeartbeatMessage() to allow the 
ClientScanner to check if the most recent server response was a 
heartbeat/keep-alive message. 
* The method postHeapNext(List<Cells>) was added to RegionScannerImpl to allow 
me to insert delays in between fetches of column family cells for testing. It 
didn't feel clean, so I was wondering if anyone had any ideas about alternative 
approaches to emulate long running scans on the server side
* Since heartbeat messages have the potential to create partial results (in the 
event that the timeout occurs in the middle of a row) we only allow heartbeat 
messages if the client has specified that heartbeats are supported AND partial 
results are also supported. 

Ideas for improvement:
* As earlier discussion indicated, the tracking of limits in RSRpcServices is 
somewhat messy. When a new limit needs to be added, the RegionScanner and 
InternalScanner interfaces must both be changed. The limit logic may be 
simplified by defining something along the lines of a ScannerLimit object. The 
object would have a field per limit and would have an associated Builder that 
would allow us to specify only the limits we care about (if a limit is not set, 
then it doesn't get enforced). Then, in the future, if a new limit was needed 
it would only amount to adding a new field in ScannerLimit and adding the 
appropriate enforcement logic (no changes to interfaces necessary). What do you 
guys think? I thought this would clean things up a bit but wanted to see if any 
objections first

Of course the finer implementation points can be seen in the patch itself and 
any feedback would be appreciated. Will post to reviewboard

Thanks

> Progress heartbeats for long running scanners
> ---------------------------------------------
>
>                 Key: HBASE-13090
>                 URL: https://issues.apache.org/jira/browse/HBASE-13090
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Andrew Purtell
>         Attachments: HBASE-13090-v1.patch
>
>
> It can be necessary to set very long timeouts for clients that issue scans 
> over large regions when all data in the region might be filtered out 
> depending on scan criteria. This is a usability concern because it can be 
> hard to identify what worst case timeout to use until scans are 
> occasionally/intermittently failing in production, depending on variable scan 
> criteria. It would be better if the client-server scan protocol can send back 
> periodic progress heartbeats to clients as long as server scanners are alive 
> and making progress.
> This is related but orthogonal to streaming scan (HBASE-13071). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to