[
https://issues.apache.org/jira/browse/CASSANDRA-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799454#action_12799454
]
Jonathan Ellis commented on CASSANDRA-342:
------------------------------------------
To get around the hadoop-stuff-has-to-run-in-a-different-JVM problem: what if
we had Hadoop operate on Cassandra snapshots? For the kind of batch oriented,
non-latency-sensitive work that Hadoop is a good fit for, that should be
perfect: the Hadoop Task can open up ColumnFamilyStore objects on the
snapshotted sstables, without having to start a full server which is nasty.
Otherwise IMO we should patch Hadoop to allow Tasks to run on an existing JVM.
I'm surprised HBase didn't do that: doing the copies of *all input* from one
jvm to another is not insignificant. (You could take that approach w/
cassandra to, using getRangeSlice from StorageProxy started with
StorageService.initClient -- actually we would want to add initLocalClient
probably to mean "I only plan to query the machine I am on" -- but that would
be a case of working around bad design instead of fixing it.)
> hadoop integration
> ------------------
>
> Key: CASSANDRA-342
> URL: https://issues.apache.org/jira/browse/CASSANDRA-342
> Project: Cassandra
> Issue Type: New Feature
> Components: Core
> Reporter: Jonathan Ellis
> Assignee: Jeff Hodges
> Attachments:
> 0001-v3-CASSANDRA-342.-Set-up-for-the-hadoop-commits.patch,
> 0002-v3-CASSANDRA-342.-Working-hadoop-support.patch,
> 0003-v3-CASSANDRA-342.-Adding-the-WordCount-example.patch
>
>
> Some discussion on -dev:
> http://mail-archives.apache.org/mod_mbox/incubator-cassandra-dev/200907.mbox/%[email protected]%3e
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.