Gabriel Reid created CRUNCH-91:
----------------------------------
Summary: Enable custom output file naming
Key: CRUNCH-91
URL: https://issues.apache.org/jira/browse/CRUNCH-91
Project: Crunch
Issue Type: Improvement
Reporter: Gabriel Reid
The current output file naming behavior in Crunch is to use the classic
Hadoop-style file naming (i.e. part-m-00001, part-r-00002), with the numerical
part of the filename being set based on the number of existing files in the
output directory to avoid naming collisions.
The intention of this issue is to allow developers to define their own output
file names for Crunch output files.
The original underlying motivation for this issue is having a custom
partitioner in a job which routes records to a specific partition (and
therefore reducer) based on content of the record, and then needing to perform
file renaming operations on the output files to allow their names to include
specific information about the partition they contain. The partition number of
files currently gets discarded by Crunch, making this renaming impossible. The
approach proposed here (custom file naming within Crunch) goes one step
further, giving developers a hook to actually define their own output file
naming scheme.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira