Tutorial should mention SetMapOutputKeyClass
--------------------------------------------
Key: MAPREDUCE-2064
URL: https://issues.apache.org/jira/browse/MAPREDUCE-2064
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: documentation
Affects Versions: 0.21.0
Reporter: Clarence Gardner
Priority: Minor
The official tutorial (mapred_tutorial.html) (and all other tutorials I've seen
on the web) show a program that has the same datatypes for the key/value pairs
emitted by the mapper and by the reducer, and shows a configuration call to
Job.setOutput{Key,Value}Class but doesn't say that it refers to both the mapper
and the reducer. It sounds like it refers to the reducer output. This might be
mentioned in the "Job Configuration" section. Here is a possible addition,
after the "The Job is used to specify ..." paragraph.
The job also configures the types of its key/value pairs with
setOutputKeyClass(type) andsetOutputValueClass(type), which appy to both the
mapper and reducer classes. If the types output by the mapper and reducer are
not the same, that should be followed with setMapOutputKeyClass(type) and
setMapOutputValueClass(type).
(I'm assuming that at least a call to setOutput{Key,Value}Class is required.)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.