Gabriel Reid created CRUNCH-91:
----------------------------------

             Summary: Enable custom output file naming
                 Key: CRUNCH-91
                 URL: https://issues.apache.org/jira/browse/CRUNCH-91
             Project: Crunch
          Issue Type: Improvement
            Reporter: Gabriel Reid


The current output file naming behavior in Crunch is to use the classic 
Hadoop-style file naming (i.e. part-m-00001, part-r-00002), with the numerical 
part of the filename being set based on the number of existing files in the 
output directory to avoid naming collisions.

The intention of this issue is to allow developers to define their own output 
file names for Crunch output files.

The original underlying motivation for this issue is having a custom 
partitioner in a job which routes records to a specific partition (and 
therefore reducer) based on content of the record, and then needing to perform 
file renaming operations on the output files to allow their names to include 
specific information about the partition they contain. The partition number of 
files currently gets discarded by Crunch, making this renaming impossible. The 
approach proposed here (custom file naming within Crunch) goes one step 
further, giving developers a hook to actually define their own output file 
naming scheme.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to