[ 
https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14164803#comment-14164803
 ] 

Lars Hofhansl commented on PHOENIX-1267:
----------------------------------------

Small scan also force pread as opposed to seek+read. seek+read is cheaper and 
also does prefetching at the datanodes, but only one scanner can use it per 
reader (i.e. HFile). See HBASE-7336.

This is a touch to predict when to use it.

If the individual scans are so small that prefetching has no benefit or that 
two RPCs would be significant as opposed to one RPC then a small scan makes 
sense. In the doc is says "Generally, if the scan range is within one data 
block(64KB), it could be considered as a small scan."

Starting with a small scan and then switching cool. If we scan multiple HFile 
blocks and these blocks are not in the HBase blockcache seek+read has the 
potential of being much faster. But we'd only eat that cost for one scan in the 
beginning.

Might be best to start with the hint only (and default to false) and perf test 
a variety of scenarios.


> Set scan.setSmall(true) when appropriate
> ----------------------------------------
>
>                 Key: PHOENIX-1267
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1267
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>            Assignee: jay wong
>         Attachments: smallscan.patch, smallscan2.patch, smallscan3.patch
>
>
> There's a nice optimization that has been in HBase for a while now to set a 
> scan as "small". This prevents extra RPC calls, I believe. We should add a 
> hint for queries that forces it to be set/not set, and make our best guess on 
> when it should default to true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to