[jira] [Commented] (CASSANDRA-5251) Hadoop support should be able to work with multiple column families

Illarion Kovalchuk (JIRA) Tue, 26 Feb 2013 01:04:15 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-5251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13586953#comment-13586953
 ]


Illarion Kovalchuk commented on CASSANDRA-5251:
-----------------------------------------------

Well, in our case we have multiple cf's, keeping different aspects of 
information about same objects. We want to merge them in a single map-reduce 
pass, in a way that mapper gets data from all column families (distinguishing 
them by context.getCurrentSplit()). 

I think you're right and if it causes random I/O, could you please suggest a 
workaround? 

Thank you.
                
> Hadoop support should be able to work with multiple column families
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-5251
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5251
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>    Affects Versions: 1.1.0, 1.1.11, 1.2.0, 2.0
>            Reporter: Illarion Kovalchuk
>            Priority: Minor
>         Attachments: trunk-5251.txt
>
>
> This patch affects api, so I changed hadoop example in it. The main 
> difference is that now ColumnFamilyInput format generates splits for all 
> input column families, and ColumnFamilyOutputFormat works not with 
> List<Mutation>, but with List<Pair<String,Mutation>>, where Pair.left is for 
> column family name.
> Thank you

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-5251) Hadoop support should be able to work with multiple column families

Reply via email to