On Fri, Sep 21, 2007 at 01:53:21PM -0700, Joydeep Sen Sarma wrote: >Why don't u create/write to hdfs files directly from reduce job (don't >depend on the default reduce output dir/files)? > >Like the cases where input is not homogenous, this seems (at least to >me) to be another common pattern (output is not homogenous). I have run >into this when loading data into hadoop (and wanting to organize >different types of records into different dirs/files).
>Just make sure >(somehow), that different reduce jobs don't try to write to same file. > Quick note: as long as you create files in the 'mapred.output.dir' directory (via map/reduce tasks) on hdfs, the framework will handle issues with speculative tasks etc. Arun > > >-----Original Message----- >From: C G [mailto:[EMAIL PROTECTED] >Sent: Friday, September 21, 2007 1:20 PM >To: [email protected] >Subject: Multiple output files, and controlling output file name... > >Hi All: > > In the context of using the aggregation classes, is there anyway to >send output to multiple files? In my case, I am processing columnar >records that are very wide. I have to do a variety of different >aggregations and the results of each type of aggregation is a set of >rows suitable for loading into a database. Rather than write all the >records to "part-00000", etc., I'd like to write them to a series of >files based. I don't see an obvious way to do this..is it possible? > > Also, for those of us that don't like "part-00000" and so forth as >naming conventions, is there a way to name the output? In my case, >incorporating a date/time stamp like "loadA-200709221600" would be very >useful. > > Thanks for any advice, > C G > > >--------------------------------- >Tonight's top picks. What will you watch tonight? Preview the hottest >shows on Yahoo! TV.
