[ 
https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14166116#comment-14166116
 ] 

Lars Hofhansl commented on PHOENIX-1267:
----------------------------------------

A small scan does two things:
# avoids prefetch - if we're only reading a few K of data, prefetching many 
megabytes (I think 2mb by default) at the data nodes is a waste (it does that 
by doing an HDFS positional read or "pread")
# avoids one RPC - you do not need to close the scanner via an extra RPC. If 
the scan takes a short amount of time the extra RPC can be significant.

I didn't get the point lookup case. Are those different from HBase Gets? Or 
you're talking about skip_scan traversing a bunch of points that are known 
ahead of time? I'm assuming you mean a skip_scan here.

The only reliable case seems to be "a scan where we know we're only scanning < 
N rows", where we can reliably predict that data scanned is in the ballpark of 
64k maybe to 1mb or so.
For the "only traverse N guideposts segments" only if the guideposts distances 
are known to be small (again in the 64k-1mb range).


> Set scan.setSmall(true) when appropriate
> ----------------------------------------
>
>                 Key: PHOENIX-1267
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1267
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>            Assignee: jay wong
>         Attachments: smallscan.patch, smallscan2.patch, smallscan3.patch
>
>
> There's a nice optimization that has been in HBase for a while now to set a 
> scan as "small". This prevents extra RPC calls, I believe. We should add a 
> hint for queries that forces it to be set/not set, and make our best guess on 
> when it should default to true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to