Re: Hadoop Streaming

Moritz Krog Wed, 14 Jul 2010 01:17:45 -0700

First of all thanks  for the quick answer :)

is there any way to configure the job in such a way, that I get the key ->
value list? I specifically need exactly this behavior.. it's crucial to what
I want to do with Hadoop..



On Wed, Jul 14, 2010 at 10:06 AM, Amareshwari Sri Ramadasu <
[email protected]> wrote:

> In streaming, the combined values are given to reducer as <key, value>
> pairs again, so you don't see key and list of values.
> I think it is done in that way to be symmetrical with mapper, though I
> don't know exact reason.
>
> Thanks
> Amareshwari
>
> On 7/14/10 1:05 PM, "Moritz Krog" <[email protected]> wrote:
>
> Hi everyone,
>
> I'm pretty new to Hadoop and generally avoiding Java everywhere I can, so
> I'm getting started with Hadoop streaming and python mapper and reducer.
> From what I read in the mapreduce tutorial, mapper an reducer can be
> plugged
> into Hadoop via the "-mapper" and "-reducer" options on job start. I was
> wondering what the input for the reducer would look like, so I ran a Hadoop
> job using my own mapper but /bin/cat as reducer. As you can see, the output
> of the job is ordered, but the keys haven't been combined:
>
> {'lastname': 'Adhikari', 'firstnames': 'P', 'suffix': None, 'type':
> 'person'}   107488
> {'lastname': 'Adhikari', 'firstnames': 'P', 'suffix': None, 'type':
> 'person'}   95560
> {'lastname': 'Adhikari', 'firstnames': 'P', 'suffix': None, 'type':
> 'person'}   95562
>
> I would have expected something like:
>
> {'lastname': 'Adhikari', 'firstnames': 'P', 'suffix': None, 'type':
> 'person'}   95560, 95562, 107488
>
> my understanding from the tutorial was, that this reduction is a part of
> the
> shuffle and sort phase. Or do I need to use a combiner to get that done?
> Does Hadoop streaming even do this, or do I need to use a native java
> class?
>
> Best,
> Moritz
>
>

Re: Hadoop Streaming

Reply via email to