If you are using the standard TextOutputFormat, and the output collector is passed a null for the value, there will not be a trailing tab character added to the output line.
output.collect( key, null ); Will give you the behavior you are looking for if your configuration is as I expect. On Tue, Feb 3, 2009 at 7:49 PM, Jack Stahl <j...@yelp.com> wrote: > Hello, > > I'm interested in a map-reduce flow where I output only values (no keys) in > my reduce step. For example, imagine the canonical word-counting program > where I'd like my output to be an unlabeled histogram of counts instead of > (word, count) pairs. > > I'm using HadoopStreaming (specifically, I'm using the dumbo module to run > my python scripts). When I simulate the map reduce using pipes and sort in > bash, it works fine. However, in Hadoop, if I output a value with no > tabs, > Hadoop appends a trailing "\t", apparently interpreting my output as a > (value, "") KV pair. I'd like to avoid outputing this trailing tab if > possible. > > Is there a command line option that could be use to effect this? More > generally, is there something wrong with outputing arbitrary strings, > instead of key-value pairs, in your reduce step? >