Re: Hadoop Streaming

Amareshwari Sri Ramadasu Wed, 14 Jul 2010 01:09:12 -0700

In streaming, the combined values are given to reducer as <key, value> pairs 
again, so you don't see key and list of values.
I think it is done in that way to be symmetrical with mapper, though I don't 
know exact reason.


Thanks
Amareshwari

On 7/14/10 1:05 PM, "Moritz Krog" <[email protected]> wrote:

Hi everyone,

I'm pretty new to Hadoop and generally avoiding Java everywhere I can, so
I'm getting started with Hadoop streaming and python mapper and reducer.
>From what I read in the mapreduce tutorial, mapper an reducer can be plugged
into Hadoop via the "-mapper" and "-reducer" options on job start. I was
wondering what the input for the reducer would look like, so I ran a Hadoop
job using my own mapper but /bin/cat as reducer. As you can see, the output
of the job is ordered, but the keys haven't been combined:

{'lastname': 'Adhikari', 'firstnames': 'P', 'suffix': None, 'type':
'person'}   107488
{'lastname': 'Adhikari', 'firstnames': 'P', 'suffix': None, 'type':
'person'}   95560
{'lastname': 'Adhikari', 'firstnames': 'P', 'suffix': None, 'type':
'person'}   95562

I would have expected something like:

{'lastname': 'Adhikari', 'firstnames': 'P', 'suffix': None, 'type':
'person'}   95560, 95562, 107488

my understanding from the tutorial was, that this reduction is a part of the
shuffle and sort phase. Or do I need to use a combiner to get that done?
Does Hadoop streaming even do this, or do I need to use a native java class?

Best,
Moritz

Re: Hadoop Streaming

Reply via email to