[ 
https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357903#comment-14357903
 ] 

Jonathan Lawlor commented on HBASE-13090:
-----------------------------------------

Thanks for the comments [~stack]

bq. If only timeout, then maybe premature for ScanLimit unless anything in 
current Scan structure that might sit better in ScanLimit?
I was thinking that we could combine the batch limit, size limit, and now the 
time limit into ScannerLimit object. With this patch, the InternalScanner and 
RegionScanner interfaces now have a large cascading call structure that looks 
like this:
{code}
NextState next(List<Cell> result) throws IOException;
...
NextState next(List<Cell> result, int batchLimit) throws IOException;
...
NextState next(List<Cell> result, int batchLimit, long sizeLimit) throws 
IOException;
...
NextState next(List<Cell> result, int batchLimit, long sizeLimit, long 
timeLimit) throws IOException;
{code}

As more limits are added, it gets uglier and uglier. The idea with ScannerLimit 
would be to change it to this:

{code}
NextState next(List<Cell> result) throws IOException;
...
NextState next(List<Cell> result, ScannerLimit limit) throws IOException;
{code}

Where the ScannerLimit object can have as many limits specified as it wants 
(may only contain a time limit, or may contain a time limit, batch limit and 
size limit).

bq. What would be the downsides if default was to allow return of partials to 
clients?
So right now partial result support is on by default but in the case that the 
scan is specified to be a small scan we disable partial results server side. 
This means that in the case of small scans we wouldn't allow heartbeat messages 
either since they could potentially create partials. Outside of small scans 
heartbeats would be supported.

bq. since you can't specify your own Scanner implementation serverside (you 
can't right?)
As far as I can tell there is no nice way to specify your own StoreScanner 
implementation but upon further investigation it looks like I can specify my 
own KeyValueHeap implementation inside the RegionScanners. This would allow me 
to take this method out. Going to investigate further and see if this ugly 
postHeapNext method can be taken out.

bq. When do I call isHeartbeatMessage? At want point in the processing?
Currently it is used inside ClientScanner.java after the Result array comes 
back from the server. By checking it here, we can see if the most recent 
response from the server (the one that returned the Results array) was a 
heartbeat message.

> Progress heartbeats for long running scanners
> ---------------------------------------------
>
>                 Key: HBASE-13090
>                 URL: https://issues.apache.org/jira/browse/HBASE-13090
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Andrew Purtell
>            Assignee: Jonathan Lawlor
>         Attachments: HBASE-13090-v1.patch
>
>
> It can be necessary to set very long timeouts for clients that issue scans 
> over large regions when all data in the region might be filtered out 
> depending on scan criteria. This is a usability concern because it can be 
> hard to identify what worst case timeout to use until scans are 
> occasionally/intermittently failing in production, depending on variable scan 
> criteria. It would be better if the client-server scan protocol can send back 
> periodic progress heartbeats to clients as long as server scanners are alive 
> and making progress.
> This is related but orthogonal to streaming scan (HBASE-13071). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to