[ 
https://issues.apache.org/jira/browse/CASSANDRA-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799454#action_12799454
 ] 

Jonathan Ellis commented on CASSANDRA-342:
------------------------------------------

To get around the hadoop-stuff-has-to-run-in-a-different-JVM problem: what if 
we had Hadoop operate on Cassandra snapshots?  For the kind of batch oriented, 
non-latency-sensitive work that Hadoop is a good fit for, that should be 
perfect: the Hadoop Task can open up ColumnFamilyStore objects on the 
snapshotted sstables, without having to start a full server which is nasty.

Otherwise IMO we should patch Hadoop to allow Tasks to run on an existing JVM.  
I'm surprised HBase didn't do that: doing the copies of *all input* from one 
jvm to another is not insignificant.  (You could take that approach w/ 
cassandra to, using getRangeSlice from StorageProxy started with 
StorageService.initClient -- actually we would want to add initLocalClient 
probably to mean "I only plan to query the machine I am on" -- but that would 
be a case of working around bad design instead of fixing it.)


> hadoop integration
> ------------------
>
>                 Key: CASSANDRA-342
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-342
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jeff Hodges
>         Attachments: 
> 0001-v3-CASSANDRA-342.-Set-up-for-the-hadoop-commits.patch, 
> 0002-v3-CASSANDRA-342.-Working-hadoop-support.patch, 
> 0003-v3-CASSANDRA-342.-Adding-the-WordCount-example.patch
>
>
> Some discussion on -dev: 
> http://mail-archives.apache.org/mod_mbox/incubator-cassandra-dev/200907.mbox/%[email protected]%3e

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to