[
https://issues.apache.org/jira/browse/MAPREDUCE-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886247#action_12886247
]
Amareshwari Sriramadasu commented on MAPREDUCE-1772:
----------------------------------------------------
Karam reported some more issues with streaming documentation. Posting them here
on his behalf. The issues are :
# In many examples, system commands such /bin/cat or /bin/wc are used. Can we
change them to 'cat' or 'wc' instead of prefixing them with '/bin', as
sometimes utilities do not reside in /bin? Especially command such as /bin/wc
is not there in /bin, it is under /usr/bin/wc.
# In some places, for usage we are using "bin/hadoop command [genericOptions]
[streamingOptions]". it should be changed to "$HADOOP_HOME/bin/hadoop command
[genericOptions] [streamingOptions]"
# There are places where we set -D mapred.reduce.tasks=0 where there is
-numReduceTasks is also for the same. From the user's view, it will be good to
replace -D mapred.reduce.tasks=0 to -numReduceTasks 0 or In usage we should
mention that -numReduceTasks and -D mapred.reduce.tasks are the same and either
or of them can be used to specify reduces. Also we need to add note as
-numReduceTasks appears in streaming option after generic options it will
over-ride -Dmapred.reduce.tasks if both are specified in command.
# In the examples for KeyFieldBasedPartitioner and KeyFieldBasedComparator, we
should change mapper from IdentityMapper to cat, because in case of
IdentityMapper map output key will be written and is LongWritable, so the
result is not same as expected.
# {code}
c2='cut -f2'; $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \
-D mapred.job.name='Experiment'
-input /user/me/samples/student_marks
-output /user/me/samples/student_out
-mapper \"$c2\" -reducer 'cat'
{code}
In the above example, -mapper \"$c2\" should be written as "$c2" otherwise
streaming interprets -f2 as invalid option and fails.
> Hadoop streaming doc should not use IdentityMapper as an example
> ----------------------------------------------------------------
>
> Key: MAPREDUCE-1772
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1772
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: contrib/streaming, documentation
> Reporter: Marco Nicosia
> Priority: Minor
>
> From the URL http://hadoop.apache.org/core/docs/current/streaming.html
> This example doesn't work:
> {quote}
> You can supply a Java class as the mapper and/or the reducer. The above
> example is equivalent to:
> $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \
> -input myInputDirs \
> -output myOutputDir \
> -mapper org.apache.hadoop.mapred.lib.IdentityMapper \
> -reducer /bin/wc
> {quote}
> This will produce the following exception:
> {quote}
> java.io.IOException: Type mismatch in key from map: expected
> org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
> {quote}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.