get_range_slices() always returns list of KeySlice containing all available
rows even if column size is empty
-------------------------------------------------------------------------------------------------------------
Key: CASSANDRA-3777
URL: https://issues.apache.org/jira/browse/CASSANDRA-3777
Project: Cassandra
Issue Type: Improvement
Components: Core
Affects Versions: 1.0.7
Environment: Debian Squeeze
Reporter: bert Passek
Hi,
we are using Cassandra to store data in super column families with a date as
their name. We would like to iterate over the keys only containing data which
matches given slice range (e.g. a certain day). In fact, method
get_range_slices() always returns all rows where getColumnSize() on given
KeySlice is 0.
In combination with Hadoop we use the ColumnFamilyInputFormat which currently
only supports SliceRanges. In our setup we might have billions of rows within a
column family. Even though setting a slice range we always have to iterate all
row keys, which in my opinion doesn't make any sense.
Lets have a look at a very simple example:
Cassandra.Client client = ConfigHelper.createConnection("localhost",
9160, true);
client.set_keyspace("Foo");
SlicePredicate predicate = new SlicePredicate();
SliceRange sliceRange = new SliceRange();
sliceRange.start = Util.bb("I@1327273200");
sliceRange.finish = Util.bb("I@1327273200~");
predicate.slice_range = sliceRange;
KeyRange keyRange = new KeyRange();
keyRange.start_key = Util.bb("");
keyRange.end_key = Util.bb("");
List<KeySlice> rows = client.get_range_slices(new ColumnParent("Bar"),
predicate,
keyRange, ConsistencyLevel.ONE);
for (KeySlice slice : rows)
{
System.out.println("key: " + new String(slice.getKey()) + ",
columns: " + slice.getColumnsSize());
}
This is the output:
key: I@1327359600@14@2074@478@32798@80445@2011@138@205@4320@0, columns: 0
key: I@1327273200@12@1151@139@801@1728@2033@138@219@4476@0, columns: 1
key: I@1327359600@14@2055@359@1032@2078@2011@138@205@4320@0, columns: 0
key: I@1327359600@14@1151@139@801@1728@2011@138@205@4320@0, columns: 0
key: I@1327273200@12@2074@478@32798@80445@2033@138@219@4476@0, columns: 1
key: I@1327273200@12@2055@359@1032@2079@2033@138@219@4476@0, columns: 1
Searching by slice ranges works fine, but for all other row keys not matching
given slice range they are still part of the result list. We are filtering out
such key slices by checking their column size, but it would make more sense to
get only those keys we are looking for (which have obviously column size > 0).
ColumnFamilyRecordReader creates sorted maps from the result list which means
creating billions of maps and passing them to the mapper which are finally
thrown away because they do not contain any content.
The question is: Is there a chance by using slice ranges to get only those key
slices which matches given slice range? Or is there any reason why this
behaviour is like described above?
Best Regards
Bert Passek
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira