[jira] Commented: (CASSANDRA-1246) Hadoop output SlicePredicate is slow and doesn't work as intended

Hudson (JIRA) Wed, 07 Jul 2010 06:57:48 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885942#action_12885942
 ]


Hudson commented on CASSANDRA-1246:
-----------------------------------

Integrated in Cassandra #488 (See 
[http://hudson.zones.apache.org/hudson/job/Cassandra/488/])
    r/m Hadoop outputSlicePredicate.  patch by jbellis; reviewed by jhanna for 
CASSANDRA-1246


> Hadoop output SlicePredicate is slow and doesn't work as intended
> -----------------------------------------------------------------
>
>                 Key: CASSANDRA-1246
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1246
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 0.7
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.7
>
>         Attachments: 1246.txt
>
>
> The output SlicePredicate is only used to attempt to check that no data 
> exists in the range that we're going to be writing data.  This is 
> (a) slow, since it performs get_range_slices across the entire key range, 
> meaning we'll hit every node in the cluster if there is no data (which is 
> supposed to be the normal case)
> (b) wrong, since it appears to be intended to use keyList.size to allow data 
> in column X to not interfere with an output to column Y, but that is not how 
> get_range_slices works -- if you have data (or even a tombstone) in any 
> column, you'll get the key back in your result list.  so what you would have 
> to do is scan every key, and check the list of columns returned, which in the 
> case of data actually existing in other columns will be prohibitively slow

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1246) Hadoop output SlicePredicate is slow and doesn't work as intended

Reply via email to