[ https://issues.apache.org/jira/browse/MAPREDUCE-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865708#action_12865708 ]
Jaideep commented on MAPREDUCE-1122: ------------------------------------ Some changes that are needed in order to support this. * Everywhere in StreamJob, o.a.h.mapred.JobConf is used. To allow new input and output formats, new o.a.h.mapreduce.Job object should be used instead. Alternatively we can create and set configuration without relying on JobConf or Job methods, and only create a JobConf or Job object depending upon whether old or new API is being used. * PipeMapper and PipeReducer are also based on the old api. We will have to create new Mappers and Reducers based on the new API in order to support newer input and output formats. PipeMapRed also uses JobConf at a number of places. Almost all of these calls could be replaced by calls to Configuration object. * StreamInputFormat extends o.a.h.mapred.KeyValueTextInputFormat. It should extend o.a.h.mapreduce.lib.input.KeyValueTextInputFormat * StreamBaseRecordReader extends o.a.h.mapred.RecordReader. New class confirming to new API is needed. * Some static methods in StreamUtil.java are using old api - getCurrentSplit - uses o.a.h.mapred.FileSplit and Jobconf. This method is not used anywhere else in the code. isLocalJobTracker - uses JobConf. getTaskInfo - uses JobConf to get type of a task and taskid. used in PipeMapRed.setStreamJobDetails to set the taskid. addJobConfToEnvironment - takes a JobConf as argument. Should also take a Job. There is a static TaskID class in StreamUtils.java as well. If its not needed can it be removed? > streaming with custom input format does not support the new API > --------------------------------------------------------------- > > Key: MAPREDUCE-1122 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1122 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming > Affects Versions: 0.20.1 > Environment: any OS > Reporter: Keith Jackson > > When trying to implement a custom input format for use with streaming, I have > found that streaming does not support the new API, > org.apache.hadoop.mapreduce.InputFormat, but requires the old API, > org.apache.hadoop.mapred.InputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.