First of all thanks for the quick answer :) is there any way to configure the job in such a way, that I get the key -> value list? I specifically need exactly this behavior.. it's crucial to what I want to do with Hadoop..
On Wed, Jul 14, 2010 at 10:06 AM, Amareshwari Sri Ramadasu < [email protected]> wrote: > In streaming, the combined values are given to reducer as <key, value> > pairs again, so you don't see key and list of values. > I think it is done in that way to be symmetrical with mapper, though I > don't know exact reason. > > Thanks > Amareshwari > > On 7/14/10 1:05 PM, "Moritz Krog" <[email protected]> wrote: > > Hi everyone, > > I'm pretty new to Hadoop and generally avoiding Java everywhere I can, so > I'm getting started with Hadoop streaming and python mapper and reducer. > From what I read in the mapreduce tutorial, mapper an reducer can be > plugged > into Hadoop via the "-mapper" and "-reducer" options on job start. I was > wondering what the input for the reducer would look like, so I ran a Hadoop > job using my own mapper but /bin/cat as reducer. As you can see, the output > of the job is ordered, but the keys haven't been combined: > > {'lastname': 'Adhikari', 'firstnames': 'P', 'suffix': None, 'type': > 'person'} 107488 > {'lastname': 'Adhikari', 'firstnames': 'P', 'suffix': None, 'type': > 'person'} 95560 > {'lastname': 'Adhikari', 'firstnames': 'P', 'suffix': None, 'type': > 'person'} 95562 > > I would have expected something like: > > {'lastname': 'Adhikari', 'firstnames': 'P', 'suffix': None, 'type': > 'person'} 95560, 95562, 107488 > > my understanding from the tutorial was, that this reduction is a part of > the > shuffle and sort phase. Or do I need to use a combiner to get that done? > Does Hadoop streaming even do this, or do I need to use a native java > class? > > Best, > Moritz > >
