[jira] Commented: (CASSANDRA-342) hadoop integration

Vijay (JIRA) Fri, 29 Jan 2010 11:10:57 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806447#action_12806447
 ]


Vijay commented on CASSANDRA-342:
---------------------------------

Hadoop Integration might need the following..... 

1) API to return the List of splits, given the number of splits. 
Using this tokens we cam span equal number of MR Jobs (Have a configuration in 
MR Job - This will be according to the complexity in processing), which will 
say how many map tasks per partition and span those process. 
-- We have getSplit(int count) which will do it for us.

2) Start token to stream.... API 
Input will be Range(String startKey, Token start, Token finish, int limit).... 
return will be 
    If Startwithkey is empty we will use the token1 as the starting point for 
the stream, else we will use startwithkey to specify the key to start with? 
Make sense? 
-- Need additional Method.

So each MR jobs will get the range of data from the Cassandra and will do 
processing on it, it can also stream the data and doesn't need to get all of it.

> hadoop integration
> ------------------
>
>                 Key: CASSANDRA-342
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-342
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jeff Hodges
>         Attachments: 
> 0001-v3-CASSANDRA-342.-Set-up-for-the-hadoop-commits.patch, 
> 0002-v3-CASSANDRA-342.-Working-hadoop-support.patch, 
> 0003-v3-CASSANDRA-342.-Adding-the-WordCount-example.patch
>
>
> Some discussion on -dev: 
> http://mail-archives.apache.org/mod_mbox/incubator-cassandra-dev/200907.mbox/%[email protected]%3e

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-342) hadoop integration

Reply via email to