[ 
https://issues.apache.org/jira/browse/CASSANDRA-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis reassigned CASSANDRA-342:
----------------------------------------

    Assignee: Jonathan Ellis  (was: Jeff Hodges)

Here's my first stab at hadoop support.  I took Jeff's patches as a starting 
point, but the many chnages we've made to Cassandra's internals since then mean 
the results are pretty different.
 - BootUp is no longer required; instead we use the Fat Client api
 - Switched to ColumnFamily as the unit for InputFormat, rather than KeySpace
 - Use get_range_slice instead of get_key_range
 - Use Tokens instead of Strings for range splitting
 - Add build.xml and bin/ scripts for WordCount demo

The combination of all this means we get RandomPartitioner support for free.  
We also get InputSplit location information for free.

My patch 01 and 02 correspond to Jeff's 02 and 03 (no changes to Cassandra 
internals have been required so far).  Then my 03 is just more changes to the 
WordCount example (I should probably squash that...)

Still todo: breaking a node's range into multiple InputSplits (this will 
require minor changes to Cassandra)

Also: as I have said before, I don't really know Hadoop, so quite possibly I 
did something stupid here.  (For instance, Jeff's InputFormat used Writeable 
subclasses for both key and value; mine uses just String and ColumnFamily since 
that is more natural, and the IF contract does not require Writeable-ness.  Is 
this Bad Hadoop Form?)

> hadoop integration
> ------------------
>
>                 Key: CASSANDRA-342
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-342
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>         Attachments: 
> 0001-v3-CASSANDRA-342.-Set-up-for-the-hadoop-commits.patch, 
> 0001-v4-add-basic-hadoop-support-one-split-per-node.txt, 
> 0002-v3-CASSANDRA-342.-Working-hadoop-support.patch, 
> 0002-v4-add-wordcount-hadoop-example.txt, 
> 0003-v3-CASSANDRA-342.-Adding-the-WordCount-example.patch, 
> 0003-v4-add-WordCountSetup-multiple-tests.txt
>
>
> Some discussion on -dev: 
> http://mail-archives.apache.org/mod_mbox/incubator-cassandra-dev/200907.mbox/%[email protected]%3e

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to