Re: Value-Only Reduce Output

Rasit OZDAS Wed, 04 Feb 2009 06:16:09 -0800

I tried it myself, it doesn't work.
I've also tried   stream.map.output.field.separator   and
map.output.key.field.separator  parameters for this purpose, they
don't work either. When hadoop sees empty string, it takes default tab
character instead.


Rasit

2009/2/4 jason hadoop <jason.had...@gmail.com>
>
> Ooops, you are using streaming., and I am not familar.
> As a terrible hack, you could set mapred.textoutputformat.separator to the
> empty string, in your configuration.
>
> On Tue, Feb 3, 2009 at 9:26 PM, jason hadoop <jason.had...@gmail.com> wrote:
>
> > If you are using the standard TextOutputFormat, and the output collector is
> > passed a null for the value, there will not be a trailing tab character
> > added to the output line.
> >
> > output.collect( key, null );
> > Will give you the behavior you are looking for if your configuration is as
> > I expect.
> >
> >
> > On Tue, Feb 3, 2009 at 7:49 PM, Jack Stahl <j...@yelp.com> wrote:
> >
> >> Hello,
> >>
> >> I'm interested in a map-reduce flow where I output only values (no keys)
> >> in
> >> my reduce step.  For example, imagine the canonical word-counting program
> >> where I'd like my output to be an unlabeled histogram of counts instead of
> >> (word, count) pairs.
> >>
> >> I'm using HadoopStreaming (specifically, I'm using the dumbo module to run
> >> my python scripts).  When I simulate the map reduce using pipes and sort
> >> in
> >> bash, it works fine.   However, in Hadoop, if I output a value with no
> >> tabs,
> >> Hadoop appends a trailing "\t", apparently interpreting my output as a
> >> (value, "") KV pair.  I'd like to avoid outputing this trailing tab if
> >> possible.
> >>
> >> Is there a command line option that could be use to effect this?  More
> >> generally, is there something wrong with outputing arbitrary strings,
> >> instead of key-value pairs, in your reduce step?
> >>
> >
> >



--
M. Raşit ÖZDAŞ

Re: Value-Only Reduce Output

Reply via email to