If you are using the standard TextOutputFormat, and the output collector is
passed a null for the value, there will not be a trailing tab character
added to the output line.

output.collect( key, null );
Will give you the behavior you are looking for if your configuration is as I
expect.

On Tue, Feb 3, 2009 at 7:49 PM, Jack Stahl <j...@yelp.com> wrote:

> Hello,
>
> I'm interested in a map-reduce flow where I output only values (no keys) in
> my reduce step.  For example, imagine the canonical word-counting program
> where I'd like my output to be an unlabeled histogram of counts instead of
> (word, count) pairs.
>
> I'm using HadoopStreaming (specifically, I'm using the dumbo module to run
> my python scripts).  When I simulate the map reduce using pipes and sort in
> bash, it works fine.   However, in Hadoop, if I output a value with no
> tabs,
> Hadoop appends a trailing "\t", apparently interpreting my output as a
> (value, "") KV pair.  I'd like to avoid outputing this trailing tab if
> possible.
>
> Is there a command line option that could be use to effect this?  More
> generally, is there something wrong with outputing arbitrary strings,
> instead of key-value pairs, in your reduce step?
>

Reply via email to