get_range_slices() always returns list of KeySlice containing all available 
rows even if column size is empty
-------------------------------------------------------------------------------------------------------------

                 Key: CASSANDRA-3777
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3777
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
    Affects Versions: 1.0.7
         Environment: Debian Squeeze
            Reporter: bert Passek


Hi,

we are using Cassandra to store data in super column families with a date as 
their name. We would like to iterate over the keys only containing data which 
matches given slice range (e.g. a certain day). In fact, method 
get_range_slices() always returns all rows where getColumnSize() on given 
KeySlice is 0.

In combination with Hadoop we use the ColumnFamilyInputFormat which currently 
only supports SliceRanges. In our setup we might have billions of rows within a 
column family. Even though setting a slice range we always have to iterate all 
row keys, which in my opinion doesn't make any sense.

Lets have a look at a very simple example:

        Cassandra.Client client = ConfigHelper.createConnection("localhost", 
9160, true);
        client.set_keyspace("Foo");

        SlicePredicate predicate = new SlicePredicate();
        SliceRange sliceRange = new SliceRange();
        sliceRange.start = Util.bb("I@1327273200");
        sliceRange.finish = Util.bb("I@1327273200~");
        predicate.slice_range = sliceRange;
        
        KeyRange keyRange = new KeyRange();
        keyRange.start_key = Util.bb("");
        keyRange.end_key = Util.bb("");

        List<KeySlice> rows = client.get_range_slices(new ColumnParent("Bar"), 
predicate,
                keyRange, ConsistencyLevel.ONE);
        
        for (KeySlice slice : rows)
        {
            System.out.println("key: " + new String(slice.getKey()) + ", 
columns: " + slice.getColumnsSize());
        }

This is the output:

key: I@1327359600@14@2074@478@32798@80445@2011@138@205@4320@0, columns: 0
key: I@1327273200@12@1151@139@801@1728@2033@138@219@4476@0, columns: 1
key: I@1327359600@14@2055@359@1032@2078@2011@138@205@4320@0, columns: 0
key: I@1327359600@14@1151@139@801@1728@2011@138@205@4320@0, columns: 0
key: I@1327273200@12@2074@478@32798@80445@2033@138@219@4476@0, columns: 1
key: I@1327273200@12@2055@359@1032@2079@2033@138@219@4476@0, columns: 1


Searching by slice ranges works fine, but for all other row keys not matching 
given slice range they are still part of the result list. We are filtering out 
such key slices by checking their column size, but it would make more sense to 
get only those keys we are looking for (which have obviously column size > 0).

ColumnFamilyRecordReader creates sorted maps from the result list which means 
creating billions of maps and passing them to the mapper which are finally 
thrown away because they do not contain any content.

The question is: Is there a chance by using slice ranges to get only those key 
slices which matches given slice range? Or is there any reason why this 
behaviour is like described above?

Best Regards

Bert Passek

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to