[
https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357903#comment-14357903
]
Jonathan Lawlor commented on HBASE-13090:
-----------------------------------------
Thanks for the comments [~stack]
bq. If only timeout, then maybe premature for ScanLimit unless anything in
current Scan structure that might sit better in ScanLimit?
I was thinking that we could combine the batch limit, size limit, and now the
time limit into ScannerLimit object. With this patch, the InternalScanner and
RegionScanner interfaces now have a large cascading call structure that looks
like this:
{code}
NextState next(List<Cell> result) throws IOException;
...
NextState next(List<Cell> result, int batchLimit) throws IOException;
...
NextState next(List<Cell> result, int batchLimit, long sizeLimit) throws
IOException;
...
NextState next(List<Cell> result, int batchLimit, long sizeLimit, long
timeLimit) throws IOException;
{code}
As more limits are added, it gets uglier and uglier. The idea with ScannerLimit
would be to change it to this:
{code}
NextState next(List<Cell> result) throws IOException;
...
NextState next(List<Cell> result, ScannerLimit limit) throws IOException;
{code}
Where the ScannerLimit object can have as many limits specified as it wants
(may only contain a time limit, or may contain a time limit, batch limit and
size limit).
bq. What would be the downsides if default was to allow return of partials to
clients?
So right now partial result support is on by default but in the case that the
scan is specified to be a small scan we disable partial results server side.
This means that in the case of small scans we wouldn't allow heartbeat messages
either since they could potentially create partials. Outside of small scans
heartbeats would be supported.
bq. since you can't specify your own Scanner implementation serverside (you
can't right?)
As far as I can tell there is no nice way to specify your own StoreScanner
implementation but upon further investigation it looks like I can specify my
own KeyValueHeap implementation inside the RegionScanners. This would allow me
to take this method out. Going to investigate further and see if this ugly
postHeapNext method can be taken out.
bq. When do I call isHeartbeatMessage? At want point in the processing?
Currently it is used inside ClientScanner.java after the Result array comes
back from the server. By checking it here, we can see if the most recent
response from the server (the one that returned the Results array) was a
heartbeat message.
> Progress heartbeats for long running scanners
> ---------------------------------------------
>
> Key: HBASE-13090
> URL: https://issues.apache.org/jira/browse/HBASE-13090
> Project: HBase
> Issue Type: New Feature
> Reporter: Andrew Purtell
> Assignee: Jonathan Lawlor
> Attachments: HBASE-13090-v1.patch
>
>
> It can be necessary to set very long timeouts for clients that issue scans
> over large regions when all data in the region might be filtered out
> depending on scan criteria. This is a usability concern because it can be
> hard to identify what worst case timeout to use until scans are
> occasionally/intermittently failing in production, depending on variable scan
> criteria. It would be better if the client-server scan protocol can send back
> periodic progress heartbeats to clients as long as server scanners are alive
> and making progress.
> This is related but orthogonal to streaming scan (HBASE-13071).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)