[ 
https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14164779#comment-14164779
 ] 

James Taylor commented on PHOENIX-1267:
---------------------------------------

Thinking about this more, I think we can have small scan set to true most of 
the time. Our scans are parallelized, so they always target only part of a 
region. So immediately in QueryCompiler.compileSingleQuery(), call 
setSmallScan(true) unless there's a NO_SMALL_SCAN hint.

Then, in the following cases, we'd turn it off:
- if there's no where clause and no limit. Probably easiest to determine this 
in QueryCompiler.compileSingleQuery().
- if a second "chunk" of data is returned from a parallel scan. This can be set 
in ChunkedResultIterator.getResultIterator().
- If there's an order by and we're traversing over a large number of segments 
(based on a new config parameter). The order by doesn't go through 
ChunkedResultIterator, so we don't have a good way of turning the option back 
off. You can determine how much data the scan will traverse by looking at 
splits.size(). This is an estimation of how many 30MB chunks 
(phoenix.stats.guidepost.width) of data that will be traversed by the scan.

Then in BaseQueryIterator.iterators(), we call scan.setSmallScan(true) if the 
SMALL_SCAN is used which would override the above logic.

Thoughts? [~lhofhansl] 

> Set scan.setSmall(true) when appropriate
> ----------------------------------------
>
>                 Key: PHOENIX-1267
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1267
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>            Assignee: jay wong
>         Attachments: smallscan.patch, smallscan2.patch, smallscan3.patch
>
>
> There's a nice optimization that has been in HBase for a while now to set a 
> scan as "small". This prevents extra RPC calls, I believe. We should add a 
> hint for queries that forces it to be set/not set, and make our best guess on 
> when it should default to true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to