For your reduce, the parameter is stream.reduce.input.field.separator, if you are supplying a reduce class and I believe the output format is TextOutputFormat...
It looks like you have tried the map parameter for the separator, not the reduce parameter. >From 0.19.0 PipeReducer: configure: reduceOutFieldSeparator = job_.get("stream.reduce.output.field.separator", "\t").getBytes("UTF-8"); reduceInputFieldSeparator = job_.get("stream.reduce.input.field.separator", "\t").getBytes("UTF-8"); this.numOfReduceOutputKeyFields = job_.getInt("stream.num.reduce.output.key.fields", 1); getInputSeparator: byte[] getInputSeparator() { return reduceInputFieldSeparator; } reduce: write(key); * clientOut_.write(getInputSeparator());* write(val); clientOut_.write('\n'); } else { // "identity reduce" * output.collect(key, val);* } On Wed, Feb 4, 2009 at 6:15 AM, Rasit OZDAS <rasitoz...@gmail.com> wrote: > I tried it myself, it doesn't work. > I've also tried stream.map.output.field.separator and > map.output.key.field.separator parameters for this purpose, they > don't work either. When hadoop sees empty string, it takes default tab > character instead. > > Rasit > > 2009/2/4 jason hadoop <jason.had...@gmail.com> > > > > Ooops, you are using streaming., and I am not familar. > > As a terrible hack, you could set mapred.textoutputformat.separator to > the > > empty string, in your configuration. > > > > On Tue, Feb 3, 2009 at 9:26 PM, jason hadoop <jason.had...@gmail.com> > wrote: > > > > > If you are using the standard TextOutputFormat, and the output > collector is > > > passed a null for the value, there will not be a trailing tab character > > > added to the output line. > > > > > > output.collect( key, null ); > > > Will give you the behavior you are looking for if your configuration is > as > > > I expect. > > > > > > > > > On Tue, Feb 3, 2009 at 7:49 PM, Jack Stahl <j...@yelp.com> wrote: > > > > > >> Hello, > > >> > > >> I'm interested in a map-reduce flow where I output only values (no > keys) > > >> in > > >> my reduce step. For example, imagine the canonical word-counting > program > > >> where I'd like my output to be an unlabeled histogram of counts > instead of > > >> (word, count) pairs. > > >> > > >> I'm using HadoopStreaming (specifically, I'm using the dumbo module to > run > > >> my python scripts). When I simulate the map reduce using pipes and > sort > > >> in > > >> bash, it works fine. However, in Hadoop, if I output a value with no > > >> tabs, > > >> Hadoop appends a trailing "\t", apparently interpreting my output as a > > >> (value, "") KV pair. I'd like to avoid outputing this trailing tab if > > >> possible. > > >> > > >> Is there a command line option that could be use to effect this? More > > >> generally, is there something wrong with outputing arbitrary strings, > > >> instead of key-value pairs, in your reduce step? > > >> > > > > > > > > > > -- > M. Raşit ÖZDAŞ >