[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1122:
-----------------------------------------------

    Attachment: patch-1122.txt

Attaching a patch which does the following:
* Deprectaes all the library classes in streaming such as AutoInputFormat, 
StreamInputFormat, StreamXmlRecordReader etc. and adds new classes which use 
new api. 
* Changes the tools DumpTypedBytes and LoadTypedBytes to use new api classes.
* Adds StreamJobConfig holding all the configuration properties used in 
streaming.
* Adds classes StreamingMapper, StreamingReducer and StreamingCombiner which 
extend new api Mapper and Reducer classes.
  ** Adds a class StreamingProcess which starts streaming process, MR 
output/error threads and waits for the threads and etc. This functionality is 
in PipeMapred.java for the old api mapper/reducer; PipeMapper and PipeReducer 
extend PipeMapred and implement old Mapper/Reducer interfaces. We cannot make 
StreamingMapper/StreamingReducer extend StreamingProcess because in new api 
mapper and reducer are not interfaces. So moved this into a separate class so 
that StreamingMapper/StreamingReducer composes it.
  ** InputWriter and OutputReader added in HADOOP-1722 take PipeMapred instance 
as a parameter for the constructor. But it does not make sense now because the 
process handling is served by separate class, StreamingProcess, for new api 
mapper/reducer. So, did a following Incompatible change (looks clean now):
  *** Changes OutputReader constructor to take DataInput as parameter, instead 
of PipeMapRed
  *** Changes InputWriter constructor to take DataOutput as parameter, instead 
of PipeMapRed
* Moves some utility methods in PipeMapRed to StreamUtil.
* Removes deprectaed StreamJob(String[] argv, boolean mayExit); Deprecates 
static public JobConf createJob(String[] argv); and adds static public Job 
createStreamingJob(String[] argv)
* Refactors setJobConf() into multiple setters to set appropriate 
mapper/reducer in use.
* Adds unit tests for all the usecases described 
[above|https://issues.apache.org/jira/browse/MAPREDUCE-1122?focusedCommentId=12878515&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12878515]


> streaming with custom input format does not support the new API
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-1122
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1122
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.20.1
>         Environment: any OS
>            Reporter: Keith Jackson
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-1122.txt
>
>
> When trying to implement a custom input format for use with streaming, I have 
> found that streaming does not support the new API, 
> org.apache.hadoop.mapreduce.InputFormat, but requires the old API, 
> org.apache.hadoop.mapred.InputFormat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to