In streaming, the combined values are given to reducer as <key, value> pairs again, so you don't see key and list of values. I think it is done in that way to be symmetrical with mapper, though I don't know exact reason.
Thanks Amareshwari On 7/14/10 1:05 PM, "Moritz Krog" <[email protected]> wrote: Hi everyone, I'm pretty new to Hadoop and generally avoiding Java everywhere I can, so I'm getting started with Hadoop streaming and python mapper and reducer. >From what I read in the mapreduce tutorial, mapper an reducer can be plugged into Hadoop via the "-mapper" and "-reducer" options on job start. I was wondering what the input for the reducer would look like, so I ran a Hadoop job using my own mapper but /bin/cat as reducer. As you can see, the output of the job is ordered, but the keys haven't been combined: {'lastname': 'Adhikari', 'firstnames': 'P', 'suffix': None, 'type': 'person'} 107488 {'lastname': 'Adhikari', 'firstnames': 'P', 'suffix': None, 'type': 'person'} 95560 {'lastname': 'Adhikari', 'firstnames': 'P', 'suffix': None, 'type': 'person'} 95562 I would have expected something like: {'lastname': 'Adhikari', 'firstnames': 'P', 'suffix': None, 'type': 'person'} 95560, 95562, 107488 my understanding from the tutorial was, that this reduction is a part of the shuffle and sort phase. Or do I need to use a combiner to get that done? Does Hadoop streaming even do this, or do I need to use a native java class? Best, Moritz
