You'll have a lot more luck w/ pig or hive as a high-level hadoop client, than python. Certainly until 1470 is done for real.
Brisk does the hadoop-on-cassandra integration for you: http://www.datastax.com/docs/0.8/brisk/about_brisk#key-features-of-brisk On Mon, May 9, 2011 at 2:37 AM, Danhang Tang <da...@zugoservices.com> wrote: > Hi all, > > I've been trying to apply this patch to Cassandra but ran into some errors. > https://issues.apache.org/jira/browse/CASSANDRA-1497 > > The comments said it's fixed for version 0.7.1. But I can't directly apply > it to this version. So I apply it manually to the java files in hadoop > package. Compiling was successful. But then when executing the > hadoop_streaming_input > I encountered a runtime error: > > 11/05/06 17:27:21 WARN conf.Configuration: mapred.job.tracker is deprecated. > Instead, use mapreduce.jobtracker.address > > packageJobJar: [./bin/../../../interface/avro/cassandra.avpr, > ./bin/mapper.py, ./bin/reducer.py, > /tmp/hadoop-radfactory/hadoop-unjar8363580286439315517/] [] > /tmp/streamjob4200946905356051819.jar tmpDir=null > > 11/05/06 17:27:23 INFO mapreduce.JobSubmitter: Cleaning up the staging area > hdfs://client1:9001/tmp/hadoop-root/mapred/staging/radfactory/.staging/job_201105051628_0015 > > Exception in thread "main" java.lang.InstantiationError: > org.apache.hadoop.mapreduce.JobContext > > at > org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.getSplits(AbstractColumnFamilyInputFormat.java:138) > > at > org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:428) > > at > org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:420) > > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:338) > > at org.apache.hadoop.mapreduce.Job.submit(Job.java:960) > > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:534) > > at > org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:924) > > at org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:123) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83) > > at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:50) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at org.apache.hadoop.util.RunJar.main(RunJar.java:192) > > > > Any ideas? > > Thanks, > > Danny > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com