[
https://issues.apache.org/jira/browse/CASSANDRA-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434994#comment-13434994
]
Goir Riog commented on CASSANDRA-3777:
--------------------------------------
Hi,
whats the status on this one ?
this "bug" still exists all versions. Is there any reason why this is like
described above ?
Its a huge network and processing overhead which can easily avoided.
A short comment on this one would be nice.
Thanks
Goir
> get_range_slices() always returns list of KeySlice containing all available
> rows even if column size is empty
> -------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-3777
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3777
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Affects Versions: 1.0.7
> Environment: Debian Squeeze
> Reporter: bert Passek
>
> Hi,
> we are using Cassandra to store data in super column families with a date as
> their name. We would like to iterate over the keys only containing data which
> matches given slice range (e.g. a certain day). In fact, method
> get_range_slices() always returns all rows where getColumnSize() on given
> KeySlice is 0.
> In combination with Hadoop we use the ColumnFamilyInputFormat which currently
> only supports SliceRanges. In our setup we might have billions of rows within
> a column family. Even though setting a slice range we always have to iterate
> all row keys, which in my opinion doesn't make any sense.
> Lets have a look at a very simple example:
> Cassandra.Client client = ConfigHelper.createConnection("localhost",
> 9160, true);
> client.set_keyspace("Foo");
> SlicePredicate predicate = new SlicePredicate();
> SliceRange sliceRange = new SliceRange();
> sliceRange.start = Util.bb("I@1327273200");
> sliceRange.finish = Util.bb("I@1327273200~");
> predicate.slice_range = sliceRange;
>
> KeyRange keyRange = new KeyRange();
> keyRange.start_key = Util.bb("");
> keyRange.end_key = Util.bb("");
> List<KeySlice> rows = client.get_range_slices(new
> ColumnParent("Bar"), predicate,
> keyRange, ConsistencyLevel.ONE);
>
> for (KeySlice slice : rows)
> {
> System.out.println("key: " + new String(slice.getKey()) + ",
> columns: " + slice.getColumnsSize());
> }
> This is the output:
> key: I@1327359600@14@2074@478@32798@80445@2011@138@205@4320@0, columns: 0
> key: I@1327273200@12@1151@139@801@1728@2033@138@219@4476@0, columns: 1
> key: I@1327359600@14@2055@359@1032@2078@2011@138@205@4320@0, columns: 0
> key: I@1327359600@14@1151@139@801@1728@2011@138@205@4320@0, columns: 0
> key: I@1327273200@12@2074@478@32798@80445@2033@138@219@4476@0, columns: 1
> key: I@1327273200@12@2055@359@1032@2079@2033@138@219@4476@0, columns: 1
> Searching by slice ranges works fine, but for all other row keys not matching
> given slice range they are still part of the result list. We are filtering
> out such key slices by checking their column size, but it would make more
> sense to get only those keys we are looking for (which have obviously column
> size > 0).
> ColumnFamilyRecordReader creates sorted maps from the result list which means
> creating billions of maps and passing them to the mapper which are finally
> thrown away because they do not contain any content.
> The question is: Is there a chance by using slice ranges to get only those
> key slices which matches given slice range? Or is there any reason why this
> behaviour is like described above?
> Best Regards
> Bert Passek
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira