Re: Value-Only Reduce Output

jason hadoop Wed, 04 Feb 2009 07:20:45 -0800

For your reduce, the parameter is stream.reduce.input.field.separator, if
you are supplying a reduce class and I believe the output format is
TextOutputFormat...


It looks like you have tried the map parameter for the separator, not the
reduce parameter.

>From 0.19.0 PipeReducer:
configure:
      reduceOutFieldSeparator =
job_.get("stream.reduce.output.field.separator", "\t").getBytes("UTF-8");
      reduceInputFieldSeparator =
job_.get("stream.reduce.input.field.separator", "\t").getBytes("UTF-8");
      this.numOfReduceOutputKeyFields =
job_.getInt("stream.num.reduce.output.key.fields", 1);

getInputSeparator:
  byte[] getInputSeparator() {
    return reduceInputFieldSeparator;
  }

reduce:
          write(key);
*          clientOut_.write(getInputSeparator());*
          write(val);
          clientOut_.write('\n');
        } else {
          // "identity reduce"
*          output.collect(key, val);*
        }


On Wed, Feb 4, 2009 at 6:15 AM, Rasit OZDAS <rasitoz...@gmail.com> wrote:

> I tried it myself, it doesn't work.
> I've also tried   stream.map.output.field.separator   and
> map.output.key.field.separator  parameters for this purpose, they
> don't work either. When hadoop sees empty string, it takes default tab
> character instead.
>
> Rasit
>
> 2009/2/4 jason hadoop <jason.had...@gmail.com>
> >
> > Ooops, you are using streaming., and I am not familar.
> > As a terrible hack, you could set mapred.textoutputformat.separator to
> the
> > empty string, in your configuration.
> >
> > On Tue, Feb 3, 2009 at 9:26 PM, jason hadoop <jason.had...@gmail.com>
> wrote:
> >
> > > If you are using the standard TextOutputFormat, and the output
> collector is
> > > passed a null for the value, there will not be a trailing tab character
> > > added to the output line.
> > >
> > > output.collect( key, null );
> > > Will give you the behavior you are looking for if your configuration is
> as
> > > I expect.
> > >
> > >
> > > On Tue, Feb 3, 2009 at 7:49 PM, Jack Stahl <j...@yelp.com> wrote:
> > >
> > >> Hello,
> > >>
> > >> I'm interested in a map-reduce flow where I output only values (no
> keys)
> > >> in
> > >> my reduce step.  For example, imagine the canonical word-counting
> program
> > >> where I'd like my output to be an unlabeled histogram of counts
> instead of
> > >> (word, count) pairs.
> > >>
> > >> I'm using HadoopStreaming (specifically, I'm using the dumbo module to
> run
> > >> my python scripts).  When I simulate the map reduce using pipes and
> sort
> > >> in
> > >> bash, it works fine.   However, in Hadoop, if I output a value with no
> > >> tabs,
> > >> Hadoop appends a trailing "\t", apparently interpreting my output as a
> > >> (value, "") KV pair.  I'd like to avoid outputing this trailing tab if
> > >> possible.
> > >>
> > >> Is there a command line option that could be use to effect this?  More
> > >> generally, is there something wrong with outputing arbitrary strings,
> > >> instead of key-value pairs, in your reduce step?
> > >>
> > >
> > >
>
>
>
> --
> M. Raşit ÖZDAŞ
>

Re: Value-Only Reduce Output

Reply via email to