[jira] Commented: (MAPREDUCE-370) Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.

Tom White (JIRA) Wed, 05 Aug 2009 07:01:38 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739504#action_12739504
 ]


Tom White commented on MAPREDUCE-370:
-------------------------------------

The only feature that MultipleOutputs needs to make it at least as powerful as 
MultipleOutputFormat is the ability to control the output file name. At present 
the MultipleOutputs file name is 

{noformat}<namedOutput>_<multiName>-(m|r)-<part-number>{noformat}

whereas in MultipleOutputFormat you have complete control over the naming, 
including the ability to create subdirectories by having a path separator 
({{/}}) in the name.

To achieve this, I think we could port MultipleOutputs, and change the 
semantics of getCollector() in the multi name case, so that the multi name is 
the full name of the name of the output file. This method is typically invoked 
in the reduce() method, where the key and value are available, and can be used 
to form the name. Applications that want to add a unique suffix can call 
FileOutputFormat#getUniqueFile() themselves.

The single name case would work as before and create a single output file for a 
named output.

> Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-370
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-370
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-370) Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.

Reply via email to