Re: Multiple output files, and controlling output file name...

Arun C Murthy Fri, 21 Sep 2007 14:07:10 -0700

On Fri, Sep 21, 2007 at 01:53:21PM -0700, Joydeep Sen Sarma wrote:
>Why don't u create/write to hdfs files directly from reduce job (don't
>depend on the default reduce output dir/files)?  
>
>Like the cases where input is not homogenous, this seems (at least to
>me) to be another common pattern (output is not homogenous). I have run
>into this when loading data into hadoop (and wanting to organize
>different types of records into different dirs/files).


>Just make sure
>(somehow), that different reduce jobs don't try to write to same file.
>

Quick note: as long as you create files in the 'mapred.output.dir' directory 
(via map/reduce tasks) on hdfs, the framework will handle issues with 
speculative tasks etc.

Arun

>
>
>-----Original Message-----
>From: C G [mailto:[EMAIL PROTECTED] 
>Sent: Friday, September 21, 2007 1:20 PM
>To: [email protected]
>Subject: Multiple output files, and controlling output file name...
>
>Hi All:
>   
>  In the context of using the aggregation classes, is there anyway to
>send output to multiple files?  In my case, I am processing columnar
>records that are very wide.  I have to do a variety of different
>aggregations and the results of each type of aggregation is a set of
>rows suitable for loading into a database.  Rather than write all the
>records to "part-00000", etc., I'd like to write them to a series of
>files based.  I don't see an obvious way to do this..is it possible?
>   
>  Also, for those of us that don't like "part-00000" and so forth as
>naming conventions, is there a way to name the output?  In my case,
>incorporating a date/time stamp like "loadA-200709221600" would be very
>useful.
>   
>  Thanks for any advice,
>  C G
>
>       
>---------------------------------
>Tonight's top picks. What will you watch tonight? Preview the hottest
>shows on Yahoo! TV.

Re: Multiple output files, and controlling output file name...

Reply via email to