RE: Multiple output files, and controlling output file name...

Joydeep Sen Sarma Fri, 21 Sep 2007 13:51:55 -0700

Why don't u create/write to hdfs files directly from reduce job (don't
depend on the default reduce output dir/files)?


Like the cases where input is not homogenous, this seems (at least to
me) to be another common pattern (output is not homogenous). I have run
into this when loading data into hadoop (and wanting to organize
different types of records into different dirs/files). Just make sure
(somehow), that different reduce jobs don't try to write to same file.



-----Original Message-----
From: C G [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 21, 2007 1:20 PM
To: [email protected]
Subject: Multiple output files, and controlling output file name...

Hi All:
   
  In the context of using the aggregation classes, is there anyway to
send output to multiple files?  In my case, I am processing columnar
records that are very wide.  I have to do a variety of different
aggregations and the results of each type of aggregation is a set of
rows suitable for loading into a database.  Rather than write all the
records to "part-00000", etc., I'd like to write them to a series of
files based.  I don't see an obvious way to do this..is it possible?
   
  Also, for those of us that don't like "part-00000" and so forth as
naming conventions, is there a way to name the output?  In my case,
incorporating a date/time stamp like "loadA-200709221600" would be very
useful.
   
  Thanks for any advice,
  C G

       
---------------------------------
Tonight's top picks. What will you watch tonight? Preview the hottest
shows on Yahoo! TV.

RE: Multiple output files, and controlling output file name...

Reply via email to