[
https://issues.apache.org/jira/browse/MAPREDUCE-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Amareshwari Sriramadasu updated MAPREDUCE-1122:
-----------------------------------------------
Attachment: patch-1122.txt
Attaching a patch which does the following:
* Deprectaes all the library classes in streaming such as AutoInputFormat,
StreamInputFormat, StreamXmlRecordReader etc. and adds new classes which use
new api.
* Changes the tools DumpTypedBytes and LoadTypedBytes to use new api classes.
* Adds StreamJobConfig holding all the configuration properties used in
streaming.
* Adds classes StreamingMapper, StreamingReducer and StreamingCombiner which
extend new api Mapper and Reducer classes.
** Adds a class StreamingProcess which starts streaming process, MR
output/error threads and waits for the threads and etc. This functionality is
in PipeMapred.java for the old api mapper/reducer; PipeMapper and PipeReducer
extend PipeMapred and implement old Mapper/Reducer interfaces. We cannot make
StreamingMapper/StreamingReducer extend StreamingProcess because in new api
mapper and reducer are not interfaces. So moved this into a separate class so
that StreamingMapper/StreamingReducer composes it.
** InputWriter and OutputReader added in HADOOP-1722 take PipeMapred instance
as a parameter for the constructor. But it does not make sense now because the
process handling is served by separate class, StreamingProcess, for new api
mapper/reducer. So, did a following Incompatible change (looks clean now):
*** Changes OutputReader constructor to take DataInput as parameter, instead
of PipeMapRed
*** Changes InputWriter constructor to take DataOutput as parameter, instead
of PipeMapRed
* Moves some utility methods in PipeMapRed to StreamUtil.
* Removes deprectaed StreamJob(String[] argv, boolean mayExit); Deprecates
static public JobConf createJob(String[] argv); and adds static public Job
createStreamingJob(String[] argv)
* Refactors setJobConf() into multiple setters to set appropriate
mapper/reducer in use.
* Adds unit tests for all the usecases described
[above|https://issues.apache.org/jira/browse/MAPREDUCE-1122?focusedCommentId=12878515&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12878515]
> streaming with custom input format does not support the new API
> ---------------------------------------------------------------
>
> Key: MAPREDUCE-1122
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1122
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: contrib/streaming
> Affects Versions: 0.20.1
> Environment: any OS
> Reporter: Keith Jackson
> Assignee: Amareshwari Sriramadasu
> Attachments: patch-1122.txt
>
>
> When trying to implement a custom input format for use with streaming, I have
> found that streaming does not support the new API,
> org.apache.hadoop.mapreduce.InputFormat, but requires the old API,
> org.apache.hadoop.mapred.InputFormat.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.