[
https://issues.apache.org/jira/browse/MAPREDUCE-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Amareshwari Sriramadasu updated MAPREDUCE-1122:
-----------------------------------------------
Attachment: patch-1122-1.txt
Patch is updated to trunk with most of the review comments incorporated. Patch
should be applied on top of MAPREDUCE-1905 to pass all tests.
bq. It'd be really good if we can separate the new classes into new packages,
library classes into a lib package and implementation classes to an impl
package?
Done
bq. There are two ways of handing the skipping of bad records in the new api
...........
Removed the dead code related to skipping in new api classes. Will add a
subtask to MAPREDUCE-1932 to add support for streaming.
StreamingReducer.java
bq. Not logging exit code when exceptions happen in reduce. Used to be the case
in old code.
Exit code is already logged in StreamingProcessManager. Even in old code, it
was getting logged twice.
bq. How about passing configuration configuration to InputWriter.initialize()
and let TextInputWriter/TextOutputReader maintain themselves the key/vaule
separators and related information instead of polluting StreamingMapper and
StreamingReducer?
Did not do this. It makes the code more complicated because, mapper and
reducers have different configuration parameter names.
Autoinputformat2
bq. No configure method like in AutoInputFormat?
New api does not have configure for inputformat.
StreamJob.java
bq. Is the compatibility left in one release?
Yes. all the removed deprecated methods have been deprectaed since release 0.19
TrApp.java
bq. Some expect() and expectDefined() calls are dropped. I could understand why
the ones related to output format are dropped to accommodate testing both new
and old apis. But removing of the checks related to input file and file length
didn't make sense to me.
New api does not have the configuration parameters for input file and length
(HADOOP-5973).
bq. Should we make the initialize methods in InputWriter and OutputReader
abstract now?
Did not do this. I don't think it is required.
Patch incorporates all other commands
> streaming with custom input format does not support the new API
> ---------------------------------------------------------------
>
> Key: MAPREDUCE-1122
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1122
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: contrib/streaming
> Affects Versions: 0.20.1
> Environment: any OS
> Reporter: Keith Jackson
> Assignee: Amareshwari Sriramadasu
> Fix For: 0.22.0
>
> Attachments: patch-1122-1.txt, patch-1122.txt
>
>
> When trying to implement a custom input format for use with streaming, I have
> found that streaming does not support the new API,
> org.apache.hadoop.mapreduce.InputFormat, but requires the old API,
> org.apache.hadoop.mapred.InputFormat.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.