[jira] Updated: (CASSANDRA-1246) Hadoop output SlicePredicate is slow and doesn't work as intended

Jonathan Ellis (JIRA) Fri, 02 Jul 2010 08:42:46 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jonathan Ellis updated CASSANDRA-1246:
--------------------------------------

    Attachment: 1246.txt

> Hadoop output SlicePredicate is slow and doesn't work as intended
> -----------------------------------------------------------------
>
>                 Key: CASSANDRA-1246
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1246
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 0.7
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.7
>
>         Attachments: 1246.txt
>
>
> The output SlicePredicate is only used to attempt to check that no data 
> exists in the range that we're going to be writing data.  This is 
> (a) slow, since it performs get_range_slices across the entire key range, 
> meaning we'll hit every node in the cluster if there is no data (which is 
> supposed to be the normal case)
> (b) wrong, since it appears to be intended to use keyList.size to allow data 
> in column X to not interfere with an output to column Y, but that is not how 
> get_range_slices works -- if you have data (or even a tombstone) in any 
> column, you'll get the key back in your result list.  so what you would have 
> to do is scan every key, and check the list of columns returned, which in the 
> case of data actually existing in other columns will be prohibitively slow

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1246) Hadoop output SlicePredicate is slow and doesn't work as intended

Reply via email to