[
https://issues.apache.org/jira/browse/CASSANDRA-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923292#action_12923292
]
Stu Hood commented on CASSANDRA-1497:
-------------------------------------
contrib/hadoop_streaming_input/bin/mapper.py
* Mentions the original source multiple times, and claims to be both a mapper
and reducer
* I suspect that extract_text can be turned into a one-liner somehow
contrib/hadoop_streaming_input/bin/reducer.py
* Needs an Apache header
contrib/hadoop_streaming_input/[input/]README.txt
* Mentions "-input": {{bin/streaming}} should fake the input, and explain why
* There is an extra copy of README.txt in an unused 'input' subdirectory
.../hadoop/ColumnFamilyRecordReader.java
* Indentation
.../hadoop/streaming/AvroResolver.java
* Updated javadoc
I looked a little bit into the immediate runtime failure, but didn't come to
any conclusions. One suspicious aspect is that Streaming appears to use the
result of Resolver.getInputWriterClass to write to both the mapper and reducer
scripts: see
http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/streaming/src/java/org/apache/hadoop/streaming/StreamJob.java?view=markup#l783
> Add input support for Hadoop Streaming
> --------------------------------------
>
> Key: CASSANDRA-1497
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1497
> Project: Cassandra
> Issue Type: New Feature
> Components: Hadoop
> Reporter: Jeremy Hanna
> Assignee: Jeremy Hanna
> Fix For: 0.7.1
>
> Attachments: 0001-An-updated-avro-based-input-streaming-solution.patch
>
>
> related to CASSANDRA-1368 - create similar functionality for input streaming.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.