[
https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14166079#comment-14166079
]
James Taylor commented on PHOENIX-1267:
---------------------------------------
You lost me at "pread" :-)
I agree that we should at a minimum provide the hints, as that's easy.
So using small scan is based on how much data you're reading, right? Not how
much data you're returning from the server? Or is it both?
In all cases in Phoenix, each scan will be within a single region and will scan
over at most a configurable number of bytes (i.e. determined by the guidepost
depth stats config). I'm going to call that a *segment*. Does that help us, in
that we know in advance how many segments we're scanning over?
Let me throw some Phoenix situations at you and if you have a chance, tell me
if you think they'd benefit from using small scan:
- a point lookup. We know how many keys we're looking for in advance and they
are complete row keys. Would using/not using small scan depend on how many
point keys we're looking for? Or on how many segments we're looking for them in?
- an ungrouped aggregation (i.e. it'll return a single row, but potentially
scan lots of rows).
- a grouped aggregation.
- a scan where we know we're only scanning < N rows. This is the
ChunkedResultIterator case, where we run the scan until a limit, and then run
it again, starting from where we left off. It's also the case where we have a
LIMIT on a non aggregate scan without an ORDER BY.
- an ordered scan. We sort on the RS side and then merge sort on the client.
The number of rows returned depends on the WHERE clause.
- any query knowing that it'll only traverse N guideposts segments (we know
this in advance), where N is the guidepost depth (maybe 1/10 of the region).
Thanks, [~lhofhansl].
> Set scan.setSmall(true) when appropriate
> ----------------------------------------
>
> Key: PHOENIX-1267
> URL: https://issues.apache.org/jira/browse/PHOENIX-1267
> Project: Phoenix
> Issue Type: Bug
> Reporter: James Taylor
> Assignee: jay wong
> Attachments: smallscan.patch, smallscan2.patch, smallscan3.patch
>
>
> There's a nice optimization that has been in HBase for a while now to set a
> scan as "small". This prevents extra RPC calls, I believe. We should add a
> hint for queries that forces it to be set/not set, and make our best guess on
> when it should default to true.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)