[ 
https://issues.apache.org/jira/browse/ACCUMULO-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845480#comment-13845480
 ] 

Chris McCubbin commented on ACCUMULO-261:
-----------------------------------------

I'm encountering the need for this setting yet again. The situation is that I 
have an iterator stack that has a high cost to re-seek. Sometimes I want all 
the results ("bulk") sometimes I only want a few ("top-k"). There really is no 
good "one size fits all" table.scan.max.memory setting in this case. If I set 
it small, the re-seek overhead kills performance on the bulk scan. If I set it 
large I look-ahead way too many entries for the top-k use-case and performance 
is again poor. 

Also related is the fact that one can only "setBatchSize" on Scanners and not 
BatchScanners.

> Scanner should support batch size specified in bytes
> ----------------------------------------------------
>
>                 Key: ACCUMULO-261
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-261
>             Project: Accumulo
>          Issue Type: New Feature
>          Components: client
>            Reporter: John Vines
>
> Currently the scanner allows a user to set batch size in numbers of entries. 
> Unfortunately this isn't too useful if you have widely varied entry size and 
> you want to keep your internal footprint within a threshold. So we should 
> also allow users to set batch size in maximum number of bytes to bring back.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to