[
https://issues.apache.org/jira/browse/CASSANDRA-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Ellis updated CASSANDRA-1246:
--------------------------------------
Attachment: 1246.txt
> Hadoop output SlicePredicate is slow and doesn't work as intended
> -----------------------------------------------------------------
>
> Key: CASSANDRA-1246
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1246
> Project: Cassandra
> Issue Type: Bug
> Components: Hadoop
> Affects Versions: 0.7
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
> Fix For: 0.7
>
> Attachments: 1246.txt
>
>
> The output SlicePredicate is only used to attempt to check that no data
> exists in the range that we're going to be writing data. This is
> (a) slow, since it performs get_range_slices across the entire key range,
> meaning we'll hit every node in the cluster if there is no data (which is
> supposed to be the normal case)
> (b) wrong, since it appears to be intended to use keyList.size to allow data
> in column X to not interfere with an output to column Y, but that is not how
> get_range_slices works -- if you have data (or even a tombstone) in any
> column, you'll get the key back in your result list. so what you would have
> to do is scan every key, and check the list of columns returned, which in the
> case of data actually existing in other columns will be prohibitively slow
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.