[ 
https://issues.apache.org/jira/browse/CASSANDRA-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923292#action_12923292
 ] 

Stu Hood edited comment on CASSANDRA-1497 at 10/20/10 11:00 PM:
----------------------------------------------------------------

contrib/hadoop_streaming_input/bin/mapper.py
* Mentions the original source multiple times, and claims to be both a mapper 
and reducer
* I suspect that extract_text can be turned into a one-liner somehow

contrib/hadoop_streaming_input/bin/reducer.py
* Needs an Apache header

contrib/hadoop_streaming_input/[input/]README.txt
* Mentions "-input": {{bin/streaming}} should fake the input, and explain why
* There is an extra copy of README.txt in an unused 'input' subdirectory

.../hadoop/ColumnFamilyRecordReader.java
* Indentation

.../hadoop/streaming/AvroResolver.java
* Updated javadoc

I looked a little bit into the immediate runtime failure, but didn't come to 
any conclusions. One suspicious aspect is that Streaming appears to use the 
result of Resolver.getInputWriterClass to write to both the mapper and reducer 
scripts: see 
http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/streaming/src/java/org/apache/hadoop/streaming/StreamJob.java?view=markup#l783

      was (Author: stuhood):
    contrib/hadoop_streaming_input/bin/mapper.py
* Mentions the original source multiple times, and claims to be both a mapper 
and reducer
* I suspect that extract_text can be turned into a one-liner somehow
contrib/hadoop_streaming_input/bin/reducer.py
* Needs an Apache header
contrib/hadoop_streaming_input/[input/]README.txt
* Mentions "-input": {{bin/streaming}} should fake the input, and explain why
* There is an extra copy of README.txt in an unused 'input' subdirectory
.../hadoop/ColumnFamilyRecordReader.java
* Indentation
.../hadoop/streaming/AvroResolver.java
* Updated javadoc

I looked a little bit into the immediate runtime failure, but didn't come to 
any conclusions. One suspicious aspect is that Streaming appears to use the 
result of Resolver.getInputWriterClass to write to both the mapper and reducer 
scripts: see 
http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/streaming/src/java/org/apache/hadoop/streaming/StreamJob.java?view=markup#l783
  
> Add input support for Hadoop Streaming
> --------------------------------------
>
>                 Key: CASSANDRA-1497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>             Fix For: 0.7.1
>
>         Attachments: 0001-An-updated-avro-based-input-streaming-solution.patch
>
>
> related to CASSANDRA-1368 - create similar functionality for input streaming.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to