[
https://issues.apache.org/jira/browse/MAPREDUCE-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031694#comment-13031694
]
Dieter Plaetinck commented on MAPREDUCE-2410:
---------------------------------------------
I think that's very well and concisely explained, but to make it really clear
to beginners I would add after the last line:
"A practical consequence of this is that reducers for streaming need to be able
to deal with different input keys"
Or even:
"A practical consequence of this is that reducers for streaming need to be able
to deal with different input keys, although some projects exist to provide a
similar abstract API on top of the streaming API, such as dumbo for python
programmers [*]"
[*] https://github.com/klbostee/dumbo/wiki/Short-tutorial
> document multiple keys per reducer oddity in hadoop streaming FAQ
> -----------------------------------------------------------------
>
> Key: MAPREDUCE-2410
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2410
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: contrib/streaming, documentation
> Affects Versions: 0.20.2
> Reporter: Dieter Plaetinck
> Assignee: Harsh J Chouraria
> Priority: Minor
> Labels: newbie
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-2410.r1.diff
>
> Original Estimate: 40m
> Remaining Estimate: 40m
>
> Hi,
> for a newcomer to hadoop streaming, it comes as a surprise that the reducer
> receives arbitrary keys, unlike the "real" hadoop where a reducer works on a
> single key.
> An explanation for this is @
> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201103.mbox/browser
> I suggest to add this to the FAQ of hadoop streaming
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira