[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate

James Taylor (JIRA) Thu, 09 Oct 2014 17:43:09 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14166079#comment-14166079
 ]


James Taylor commented on PHOENIX-1267:
---------------------------------------

You lost me at "pread" :-)

I agree that we should at a minimum provide the hints, as that's easy.

So using small scan is based on how much data you're reading, right? Not how 
much data you're returning from the server? Or is it both?

In all cases in Phoenix, each scan will be within a single region and will scan 
over at most a configurable number of bytes (i.e. determined by the guidepost 
depth stats config). I'm going to call that a *segment*. Does that help us, in 
that we know in advance how many segments we're scanning over? 

Let me throw some Phoenix situations at you and if you have a chance, tell me 
if you think they'd benefit from using small scan:
- a point lookup. We know how many keys we're looking for in advance and they 
are complete row keys. Would using/not using small scan depend on how many 
point keys we're looking for? Or on how many segments we're looking for them in?
- an ungrouped aggregation (i.e. it'll return a single row, but potentially 
scan lots of rows).
- a grouped aggregation.
- a scan where we know we're only scanning < N rows. This is the 
ChunkedResultIterator case, where we run the scan until a limit, and then run 
it again, starting from where we left off. It's also the case where we have a 
LIMIT on a non aggregate scan without an ORDER BY.
- an ordered scan. We sort on the RS side and then merge sort on the client. 
The number of rows returned depends on the WHERE clause.
- any query knowing that it'll only traverse N guideposts segments (we know 
this in advance), where N is the guidepost depth (maybe 1/10 of the region).

Thanks, [~lhofhansl].

> Set scan.setSmall(true) when appropriate
> ----------------------------------------
>
>                 Key: PHOENIX-1267
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1267
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>            Assignee: jay wong
>         Attachments: smallscan.patch, smallscan2.patch, smallscan3.patch
>
>
> There's a nice optimization that has been in HBase for a while now to set a 
> scan as "small". This prevents extra RPC calls, I believe. We should add a 
> hint for queries that forces it to be set/not set, and make our best guess on 
> when it should default to true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate

Reply via email to