[jira] Created: (CASSANDRA-789) Add configurable range sizes, paging to hadoop range queries

Jonathan Ellis (JIRA) Fri, 12 Feb 2010 13:32:52 -0800

Add configurable range sizes, paging to hadoop range queries
------------------------------------------------------------


                 Key: CASSANDRA-789
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-789
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
            Reporter: Jonathan Ellis
            Priority: Minor


For very large (billions) numbers of keys, the current hardcoded 4096 keys per 
InputSplit could cause the split generator to OOM, since all splits are held in 
memory at once.  So we want to make 2 changes:

 1) make the number of keys configurable*
 2) make record reader page instead of assuming it can read all rows into 
memory at once

Note: going back to specifying number of splits instead of number of keys is 
bad for two reasons.  First, it does not work with the standard hadoop 
mapred.min.split.size configuration option.  Second, it means we have no way of 
measuring progress in the record reader, since we have no idea how many keys 
are in the split.  If we specify number of keys, then even if we page we know 
(to within a small margin of error) how many keys to expect, even if we page.

See CASSANDRA-775, CASSANDRA-342 for background.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (CASSANDRA-789) Add configurable range sizes, paging to hadoop range queries

Reply via email to