Hi Alan, Unless you run your job with a single reducer you will not be able to do this. Think scalable: you should always add '-r-NNNNN' to the end to allow for multiple reducers and you can use custom partitioner to make sure each host goes to a single reducer. MultipleOutputs can do the rest, meaning the 'YYYY-MM-DD' prefix. 2 looks like a simple aggregation job: the key should be the host name, and you need just to aggregate the values for each host x YYYY-MM-DD pair and write them into separate 'YYYY-MM-DD-r-NNNNN' files. You can also do secondary sort to make sure the YYYY-MM-DD values come in order: this way you do not need to aggregate them in memory. See Reducer.java<http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Reducer.html>for details.
Alex K On Wed, May 12, 2010 at 3:04 PM, Alan Miller <[email protected]>wrote: > Hi Alex, > > The tab isn't the issue (yet). I guess it's really 2 questions I have. > Using the reducer inputs already mentioned. > > 1. How do I generate multiple output files named YYYY-MM-DD.txt > 2. Each file should contain > a. one line per host > b. each line with host avg1 avg2 avg3 .... > > Alan > > > On 05/12/2010 11:50 PM, Alex Kozlov wrote: > > Hi Alan, > > Is the problem that you want your 'value' vals to be tab separated? This > is entirely under control of your reducer. > > Alex K > > On Wed, May 12, 2010 at 2:07 PM, Alan Miller <[email protected]>wrote: > >> Hi all, >> >> How can I write tab-delimited output files from my reducer? >> >> My reducer gets Text/Text key/vals like: >> >> hostX_2010-05-01 varA=valA1,varB=valB1,varC=valC1 >> hostX_2010-05-01 varA=valA2,varB=valB2,varC=valC2 >> hostX_2010-05-01 varA=valA3,varB=valB3,varC=valC3 >> ... >> hostY_2010-05-01 varA=valA1,varB=valB1,varC=valC1 >> hostY_2010-05-01 varA=valA2,varB=valB2,varC=valC2 >> hostY_2010-05-01 varA=valA3,varB=valB3,varC=valC3 >> ... >> >> After my reducer calcs the daily averages of varA,B,C >> I want to write a tab-delimited file with lines like: >> >> hostX varA-Avg varB-Avg varC-Avg .... >> hostY varA-Avg varB-Avg varC-Avg .... >> >> >> Thanks, >> Alan >> > > >
