[
https://issues.apache.org/jira/browse/MAPREDUCE-370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12738185#action_12738185
]
Amareshwari Sriramadasu commented on MAPREDUCE-370:
---------------------------------------------------
I propose following api for
org.apache.hadoop.mapreduce.lib.output.MultipleOutputs combining both
org.apache.hadoop.mapred.lib.MultipleOutputs and
org.apache.hadoop.mapred.lib.MultipleOutputFormat :
{code}
public class MultipleOutputs<KEYOUT, VALUEOUT> {
public MultipleOutputs(TaskInputOutputContext context) {
}
// Adds a named output for the job.
public static void addNamedOutput(Job job, String namedOutput,
Class<? extends FileOutputFormat> outputFormatClass,
Class<?> keyClass, Class<?> valueClass) ;
// Adds a mutli-named output for the job
public static void addMultiNamedOutput(Job job, String namedOutput,
Class<? extends FileOutputFormat> outputFormatClass,
Class<?> keyClass, Class<?> valueClass);
// Enables counters for named outputs
public static void setCountersEnabled(Job job, boolean enabled);
// Write to a named output.
public <K,V> void write(String namedOutput, K key, V value)
throws IOException, InterruptedException;
// write to a multi-named output
public <K,V> void write(String namedOutput, String multiName, K key, V value)
throws IOException, InterruptedException;
// Writes to an output file name that depends on key, value and context
// gets the record writer from job's outputformat.
//Job's output format should be a FileOutputFormat. Throws IOException if it
is not.
public void write(KEYOUT key, VALUEOUT value)
throws IOException, InterruptedException;
// close all outputs
public void close() throws IOException, InterruptedException;
/**
* Generate the leaf name for the output file name. The default behavior does
* not change the leaf file name (such as part-*-00000)
*/
protected String generateLeafFileName(String name) {
return name;
}
/**
* Generate the file output file name based on key, value, context and the
leaf file
* name. The default behavior does not depend on key, value and context.
*/
protected String generateFileName(KEYOUT key, VALUEOUT value,
TaskAttemptContext context, String name) {
return name;
}
/**
* Generate the actual key from the given key/value. The default behavior is
that
* the actual key is equal to the given key
*
* @param key
* the key of the output data
* @param value
* the value of the output data
* @return the actual key derived from the given key/value
*/
protected <K,V> K generateActualKey(K key, V value) {
return key;
}
/**
* Generate the actual value from the given key and value.
* The default behavior is that
* the actual value is equal to the given value
*
* @param key
* the key of the output data
* @param value
* the value of the output data
* @return the actual value derived from the given key/value
*/
protected <K,V> V generateActualValue(K key, V value) {
return value;
}
}
{code}
Thoughts?
> Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.
> -------------------------------------------------------------------
>
> Key: MAPREDUCE-370
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-370
> Project: Hadoop Map/Reduce
> Issue Type: Sub-task
> Reporter: Amareshwari Sriramadasu
> Assignee: Amareshwari Sriramadasu
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.