[
https://issues.apache.org/jira/browse/CASSANDRA-789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846161#action_12846161
]
Jonathan Ellis commented on CASSANDRA-789:
------------------------------------------
could this code
// check if we need another batch
if(i >= rows.size())
{
rows = null;
return computeNext();
}
be moved into maybeInit so computeNext doesn't have to recurse?
+1 otherwise, after formatting fixes (brace on newline, space between if and
open-paren)
> Add configurable range sizes, paging to hadoop range queries
> ------------------------------------------------------------
>
> Key: CASSANDRA-789
> URL: https://issues.apache.org/jira/browse/CASSANDRA-789
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Jonathan Ellis
> Priority: Minor
> Fix For: 0.7
>
> Attachments: CASSANDRA-789.patch
>
>
> For very large (billions) numbers of keys, the current hardcoded 4096 keys
> per InputSplit could cause the split generator to OOM, since all splits are
> held in memory at once. So we want to make 2 changes:
> 1) make the number of keys configurable*
> 2) make record reader page instead of assuming it can read all rows into
> memory at once
> Note: going back to specifying number of splits instead of number of keys is
> bad for two reasons. First, it does not work with the standard hadoop
> mapred.min.split.size configuration option. Second, it means we have no way
> of measuring progress in the record reader, since we have no idea how many
> keys are in the split. If we specify number of keys, then even if we page we
> know (to within a small margin of error) how many keys to expect, even if we
> page.
> See CASSANDRA-775, CASSANDRA-342 for background.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.