My (0.18.2) reduce src looks like this: write(key); clientOut_.write('\t'); write(val); clientOut_.write('\n');
which explains why avoiding the trailing tab is unavoidable. Thanks for your help, though, Jason! 2009/2/4 jason hadoop <jason.had...@gmail.com> > For your reduce, the parameter is stream.reduce.input.field.separator, if > you are supplying a reduce class and I believe the output format is > TextOutputFormat... > > It looks like you have tried the map parameter for the separator, not the > reduce parameter. > > From 0.19.0 PipeReducer: > configure: > reduceOutFieldSeparator = > job_.get("stream.reduce.output.field.separator", "\t").getBytes("UTF-8"); > reduceInputFieldSeparator = > job_.get("stream.reduce.input.field.separator", "\t").getBytes("UTF-8"); > this.numOfReduceOutputKeyFields = > job_.getInt("stream.num.reduce.output.key.fields", 1); > > getInputSeparator: > byte[] getInputSeparator() { > return reduceInputFieldSeparator; > } > > reduce: > write(key); > * clientOut_.write(getInputSeparator());* > write(val); > clientOut_.write('\n'); > } else { > // "identity reduce" > * output.collect(key, val);* > } > > > On Wed, Feb 4, 2009 at 6:15 AM, Rasit OZDAS <rasitoz...@gmail.com> wrote: > > > I tried it myself, it doesn't work. > > I've also tried stream.map.output.field.separator and > > map.output.key.field.separator parameters for this purpose, they > > don't work either. When hadoop sees empty string, it takes default tab > > character instead. > > > > Rasit > > > > 2009/2/4 jason hadoop <jason.had...@gmail.com> > > > > > > Ooops, you are using streaming., and I am not familar. > > > As a terrible hack, you could set mapred.textoutputformat.separator to > > the > > > empty string, in your configuration. > > > > > > On Tue, Feb 3, 2009 at 9:26 PM, jason hadoop <jason.had...@gmail.com> > > wrote: > > > > > > > If you are using the standard TextOutputFormat, and the output > > collector is > > > > passed a null for the value, there will not be a trailing tab > character > > > > added to the output line. > > > > > > > > output.collect( key, null ); > > > > Will give you the behavior you are looking for if your configuration > is > > as > > > > I expect. > > > > > > > > > > > > On Tue, Feb 3, 2009 at 7:49 PM, Jack Stahl <j...@yelp.com> wrote: > > > > > > > >> Hello, > > > >> > > > >> I'm interested in a map-reduce flow where I output only values (no > > keys) > > > >> in > > > >> my reduce step. For example, imagine the canonical word-counting > > program > > > >> where I'd like my output to be an unlabeled histogram of counts > > instead of > > > >> (word, count) pairs. > > > >> > > > >> I'm using HadoopStreaming (specifically, I'm using the dumbo module > to > > run > > > >> my python scripts). When I simulate the map reduce using pipes and > > sort > > > >> in > > > >> bash, it works fine. However, in Hadoop, if I output a value with > no > > > >> tabs, > > > >> Hadoop appends a trailing "\t", apparently interpreting my output as > a > > > >> (value, "") KV pair. I'd like to avoid outputing this trailing tab > if > > > >> possible. > > > >> > > > >> Is there a command line option that could be use to effect this? > More > > > >> generally, is there something wrong with outputing arbitrary > strings, > > > >> instead of key-value pairs, in your reduce step? > > > >> > > > > > > > > > > > > > > > > -- > > M. Raşit ÖZDAŞ > > >