[
https://issues.apache.org/jira/browse/MAPREDUCE-370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739504#action_12739504
]
Tom White commented on MAPREDUCE-370:
-------------------------------------
The only feature that MultipleOutputs needs to make it at least as powerful as
MultipleOutputFormat is the ability to control the output file name. At present
the MultipleOutputs file name is
{noformat}<namedOutput>_<multiName>-(m|r)-<part-number>{noformat}
whereas in MultipleOutputFormat you have complete control over the naming,
including the ability to create subdirectories by having a path separator
({{/}}) in the name.
To achieve this, I think we could port MultipleOutputs, and change the
semantics of getCollector() in the multi name case, so that the multi name is
the full name of the name of the output file. This method is typically invoked
in the reduce() method, where the key and value are available, and can be used
to form the name. Applications that want to add a unique suffix can call
FileOutputFormat#getUniqueFile() themselves.
The single name case would work as before and create a single output file for a
named output.
> Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.
> -------------------------------------------------------------------
>
> Key: MAPREDUCE-370
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-370
> Project: Hadoop Map/Reduce
> Issue Type: Sub-task
> Reporter: Amareshwari Sriramadasu
> Assignee: Amareshwari Sriramadasu
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.