Re: Hadoop Streaming

Alex Kozlov Wed, 14 Jul 2010 01:52:56 -0700

You can use the following perl script as a reducer:

===
#!/usr/bin/perl


$,="\t";

while (<>) {
    my ($key, $value) = split($,, $_, 2);
    if ($lastkey eq $key) {
      push @values, $value;
    } else {
      print $lastkey, join(",", @values) if defined($lastkey);
      $lastkey = $key;
      @values = ($value);
    }
}

print $lastkey, join(",", @values) if defined($lastkey) and @values > 0;
===

Alex K


On Wed, Jul 14, 2010 at 1:17 AM, Moritz Krog <[email protected]>wrote:

> First of all thanks  for the quick answer :)
>
> is there any way to configure the job in such a way, that I get the key ->
> value list? I specifically need exactly this behavior.. it's crucial to
> what
> I want to do with Hadoop..
>
>
> On Wed, Jul 14, 2010 at 10:06 AM, Amareshwari Sri Ramadasu <
> [email protected]> wrote:
>
> > In streaming, the combined values are given to reducer as <key, value>
> > pairs again, so you don't see key and list of values.
> > I think it is done in that way to be symmetrical with mapper, though I
> > don't know exact reason.
> >
> > Thanks
> > Amareshwari
> >
> > On 7/14/10 1:05 PM, "Moritz Krog" <[email protected]> wrote:
> >
> > Hi everyone,
> >
> > I'm pretty new to Hadoop and generally avoiding Java everywhere I can, so
> > I'm getting started with Hadoop streaming and python mapper and reducer.
> > From what I read in the mapreduce tutorial, mapper an reducer can be
> > plugged
> > into Hadoop via the "-mapper" and "-reducer" options on job start. I was
> > wondering what the input for the reducer would look like, so I ran a
> Hadoop
> > job using my own mapper but /bin/cat as reducer. As you can see, the
> output
> > of the job is ordered, but the keys haven't been combined:
> >
> > {'lastname': 'Adhikari', 'firstnames': 'P', 'suffix': None, 'type':
> > 'person'}   107488
> > {'lastname': 'Adhikari', 'firstnames': 'P', 'suffix': None, 'type':
> > 'person'}   95560
> > {'lastname': 'Adhikari', 'firstnames': 'P', 'suffix': None, 'type':
> > 'person'}   95562
> >
> > I would have expected something like:
> >
> > {'lastname': 'Adhikari', 'firstnames': 'P', 'suffix': None, 'type':
> > 'person'}   95560, 95562, 107488
> >
> > my understanding from the tutorial was, that this reduction is a part of
> > the
> > shuffle and sort phase. Or do I need to use a combiner to get that done?
> > Does Hadoop streaming even do this, or do I need to use a native java
> > class?
> >
> > Best,
> > Moritz
> >
> >
>

Re: Hadoop Streaming

Reply via email to