You may wish to set the separator to the string comma space ', ' for your example. chapter 7 of my book goes into this in some detail, and I posted a graphic that visually depicts the process and the values about a month ago. The original post was titled 'Changing key/value separator in hadoop streaming' and I have attached the graphic.
On Tue, May 12, 2009 at 7:55 PM, Alan Drew <drewsk...@yahoo.com> wrote: > > Hi, > > I have a question about the <key, values> that the reducer gets in Hadoop > Streaming. > > I wrote a simple mapper.sh, reducer.sh script files: > > mapper.sh : > > #!/bin/bash > > while read data > do > #tokenize the data and output the values <word, 1> > echo $data | awk '{token=0; while(++token<=NF) print $token"\t1"}' > done > > reducer.sh : > > #!/bin/bash > > while read data > do > echo -e $data > done > > The mapper tokenizes a line of input and outputs <word, 1> pairs to > standard > output. The reducer just outputs what it gets from standard input. > > I have a simple input file: > > cat in the hat > ate my mat the > > I was expecting the final output to be something like: > > the 1 1 1 > cat 1 > > etc. > > but instead each word has its own line, which makes me think that > <key,value> is being given to the reducer and not <key, values> which is > default for normal Hadoop (in Java) right? > > the 1 > the 1 > the 1 > cat 1 > > Is there any way to get <key, values> for the reducer and not a bunch of > <key, value> pairs? I looked into the -reducer aggregate option, but there > doesn't seem to be a way to customize what the reducer does with the <key, > values> other than max,min functions. > > Thanks. > -- > View this message in context: > http://www.nabble.com/hadoop-streaming-reducer-values-tp23514523p23514523.html > Sent from the Hadoop core-user mailing list archive at Nabble.com. > > -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422 www.prohadoopbook.com a community for Hadoop Professionals