[jira] Commented: (MAPREDUCE-1772) Hadoop streaming doc should not use IdentityMapper as an example

Amareshwari Sriramadasu (JIRA) Wed, 07 Jul 2010 22:52:31 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886247#action_12886247
 ]


Amareshwari Sriramadasu commented on MAPREDUCE-1772:
----------------------------------------------------

Karam reported some more issues with streaming documentation. Posting them here 
on his behalf. The issues are :
#  In many examples, system commands such /bin/cat or /bin/wc are used. Can we 
change them to 'cat' or 'wc' instead of prefixing them with '/bin', as 
sometimes utilities do not reside in /bin?  Especially command such as /bin/wc 
is not there in /bin,  it is under /usr/bin/wc.
# In some places, for usage we are using "bin/hadoop command [genericOptions] 
[streamingOptions]". it should be changed to "$HADOOP_HOME/bin/hadoop command 
[genericOptions] [streamingOptions]"
# There are places where we set -D mapred.reduce.tasks=0 where there is 
-numReduceTasks is also for the same. From the user's view, it will be good to 
replace -D mapred.reduce.tasks=0 to -numReduceTasks 0 or In usage we should 
mention that -numReduceTasks and -D mapred.reduce.tasks are the same and either 
or of them can be used to specify reduces. Also we need to add note as 
-numReduceTasks appears in streaming option after generic options it will 
over-ride -Dmapred.reduce.tasks if both are specified in command.
# In the examples for KeyFieldBasedPartitioner and KeyFieldBasedComparator, we 
should change mapper from IdentityMapper to cat, because in case of 
IdentityMapper map output key will be written and is LongWritable, so the 
result is not same as expected.
# {code}
c2='cut -f2'; $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \
    -D mapred.job.name='Experiment'
    -input /user/me/samples/student_marks 
    -output /user/me/samples/student_out 
    -mapper \"$c2\" -reducer 'cat' 
{code}
In the above example, -mapper \"$c2\" should be written as "$c2" otherwise 
streaming interprets -f2 as invalid option and fails.



> Hadoop streaming doc should not use IdentityMapper as an example
> ----------------------------------------------------------------
>
>                 Key: MAPREDUCE-1772
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1772
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/streaming, documentation
>            Reporter: Marco Nicosia
>            Priority: Minor
>
> From the URL http://hadoop.apache.org/core/docs/current/streaming.html
> This example doesn't work:
> {quote}
> You can supply a Java class as the mapper and/or the reducer. The above 
> example is equivalent to:
> $HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
>     -input myInputDirs \
>     -output myOutputDir \
>     -mapper org.apache.hadoop.mapred.lib.IdentityMapper \
>     -reducer /bin/wc
> {quote}
> This will produce the following exception:
> {quote}
> java.io.IOException: Type mismatch in key from map: expected 
> org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1772) Hadoop streaming doc should not use IdentityMapper as an example

Reply via email to