[
https://issues.apache.org/jira/browse/MAPREDUCE-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Harsh J Chouraria updated MAPREDUCE-2410:
-----------------------------------------
Attachment: MAPREDUCE-2410.r1.diff
Dieter,
I've attached a patch that adds a documentation entry to the streaming's FAQ
page.
Let me know if the following is sufficient (its what the patch contains as
well):
{code}
+<section>
+<title>How does the use of streaming differ from the Java MapReduce
API?</title>
+<p>
+ The Java MapReduce API provides a higher level API that lets the developer
focus on writing map and reduce functions that act upon a pair of key and
associated value(s). The Java API takes care of the iteration over the data
source behind the scenes.
+ In streaming, the framework pours in the input data over the stdin to the
mapper/reduce program, and thus these programs ought to be written from the
reading (via stdin) iteration level.
+</p>
+</section>
{code}
> document multiple keys per reducer oddity in hadoop streaming FAQ
> -----------------------------------------------------------------
>
> Key: MAPREDUCE-2410
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2410
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: contrib/streaming, documentation
> Reporter: Dieter Plaetinck
> Priority: Minor
> Labels: newbie
> Attachments: MAPREDUCE-2410.r1.diff
>
> Original Estimate: 40m
> Remaining Estimate: 40m
>
> Hi,
> for a newcomer to hadoop streaming, it comes as a surprise that the reducer
> receives arbitrary keys, unlike the "real" hadoop where a reducer works on a
> single key.
> An explanation for this is @
> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201103.mbox/browser
> I suggest to add this to the FAQ of hadoop streaming
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira