[jira] Commented: (CASSANDRA-789) Add configurable range sizes, paging to hadoop range queries

Jonathan Ellis (JIRA) Tue, 16 Mar 2010 15:04:51 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846161#action_12846161
 ]


Jonathan Ellis commented on CASSANDRA-789:
------------------------------------------

could this code

            // check if we need another batch 
            if(i >= rows.size())
            {
                rows = null;
                return computeNext();
            }

be moved into maybeInit so computeNext doesn't have to recurse?

+1 otherwise, after formatting fixes (brace on newline, space between if and 
open-paren)

> Add configurable range sizes, paging to hadoop range queries
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-789
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-789
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: CASSANDRA-789.patch
>
>
> For very large (billions) numbers of keys, the current hardcoded 4096 keys 
> per InputSplit could cause the split generator to OOM, since all splits are 
> held in memory at once.  So we want to make 2 changes:
>  1) make the number of keys configurable*
>  2) make record reader page instead of assuming it can read all rows into 
> memory at once
> Note: going back to specifying number of splits instead of number of keys is 
> bad for two reasons.  First, it does not work with the standard hadoop 
> mapred.min.split.size configuration option.  Second, it means we have no way 
> of measuring progress in the record reader, since we have no idea how many 
> keys are in the split.  If we specify number of keys, then even if we page we 
> know (to within a small margin of error) how many keys to expect, even if we 
> page.
> See CASSANDRA-775, CASSANDRA-342 for background.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-789) Add configurable range sizes, paging to hadoop range queries

Reply via email to