[
https://issues.apache.org/jira/browse/CASSANDRA-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Ellis reassigned CASSANDRA-342:
----------------------------------------
Assignee: Jonathan Ellis (was: Jeff Hodges)
Here's my first stab at hadoop support. I took Jeff's patches as a starting
point, but the many chnages we've made to Cassandra's internals since then mean
the results are pretty different.
- BootUp is no longer required; instead we use the Fat Client api
- Switched to ColumnFamily as the unit for InputFormat, rather than KeySpace
- Use get_range_slice instead of get_key_range
- Use Tokens instead of Strings for range splitting
- Add build.xml and bin/ scripts for WordCount demo
The combination of all this means we get RandomPartitioner support for free.
We also get InputSplit location information for free.
My patch 01 and 02 correspond to Jeff's 02 and 03 (no changes to Cassandra
internals have been required so far). Then my 03 is just more changes to the
WordCount example (I should probably squash that...)
Still todo: breaking a node's range into multiple InputSplits (this will
require minor changes to Cassandra)
Also: as I have said before, I don't really know Hadoop, so quite possibly I
did something stupid here. (For instance, Jeff's InputFormat used Writeable
subclasses for both key and value; mine uses just String and ColumnFamily since
that is more natural, and the IF contract does not require Writeable-ness. Is
this Bad Hadoop Form?)
> hadoop integration
> ------------------
>
> Key: CASSANDRA-342
> URL: https://issues.apache.org/jira/browse/CASSANDRA-342
> Project: Cassandra
> Issue Type: New Feature
> Components: Core
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
> Attachments:
> 0001-v3-CASSANDRA-342.-Set-up-for-the-hadoop-commits.patch,
> 0001-v4-add-basic-hadoop-support-one-split-per-node.txt,
> 0002-v3-CASSANDRA-342.-Working-hadoop-support.patch,
> 0002-v4-add-wordcount-hadoop-example.txt,
> 0003-v3-CASSANDRA-342.-Adding-the-WordCount-example.patch,
> 0003-v4-add-WordCountSetup-multiple-tests.txt
>
>
> Some discussion on -dev:
> http://mail-archives.apache.org/mod_mbox/incubator-cassandra-dev/200907.mbox/%[email protected]%3e
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.