[jira] Commented: (MAPREDUCE-370) Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.

Amareshwari Sriramadasu (JIRA) Sun, 02 Aug 2009 21:46:42 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12738185#action_12738185
 ]


Amareshwari Sriramadasu commented on MAPREDUCE-370:
---------------------------------------------------

I propose following api for 
org.apache.hadoop.mapreduce.lib.output.MultipleOutputs combining both 
org.apache.hadoop.mapred.lib.MultipleOutputs and 
org.apache.hadoop.mapred.lib.MultipleOutputFormat :
{code}
public class MultipleOutputs<KEYOUT, VALUEOUT>  {

  public MultipleOutputs(TaskInputOutputContext context) {
  }

   // Adds a named output for the job.
  public static void addNamedOutput(Job job, String namedOutput,
      Class<? extends FileOutputFormat> outputFormatClass,
      Class<?> keyClass, Class<?> valueClass) ;

  // Adds a mutli-named output for the job
  public static void addMultiNamedOutput(Job job, String namedOutput,
      Class<? extends FileOutputFormat> outputFormatClass,
      Class<?> keyClass, Class<?> valueClass);

  // Enables counters for named outputs
  public static void setCountersEnabled(Job job, boolean enabled);

  // Write to a named output.
  public <K,V> void write(String namedOutput, K key, V value)
          throws IOException, InterruptedException;

  // write to a multi-named output
  public <K,V> void write(String namedOutput, String multiName, K key, V value) 
         throws IOException, InterruptedException;

  // Writes to  an output file name that depends on key, value and context
  // gets the record writer from job's outputformat.  
  //Job's output format should be a FileOutputFormat. Throws IOException if it 
is not.
  public  void write(KEYOUT key, VALUEOUT value) 
          throws IOException, InterruptedException;

  // close all outputs
  public void close() throws IOException, InterruptedException;

  /**
   * Generate the leaf name for the output file name. The default behavior does
   * not change the leaf file name (such as part-*-00000)
   */
  protected String generateLeafFileName(String name) {
    return name;
  }

  /**
   * Generate the file output file name based on key, value, context and the 
leaf file
   * name. The default behavior does not depend on key, value and context.
   */
  protected String generateFileName(KEYOUT  key, VALUEOUT value,
      TaskAttemptContext context, String name) {
    return name;
  }

  /**
   * Generate the actual key from the given key/value. The default behavior is 
that
   * the actual key is equal to the given key
   * 
   * @param key
   *          the key of the output data
   * @param value
   *          the value of the output data
   * @return the actual key derived from the given key/value
   */
  protected <K,V> K generateActualKey(K key, V value) {
    return key;
  }
  
  /**
   * Generate the actual value from the given key and value. 
   * The default behavior is that
   * the actual value is equal to the given value
   * 
   * @param key
   *          the key of the output data
   * @param value
   *          the value of the output data
   * @return the actual value derived from the given key/value
   */
  protected <K,V> V generateActualValue(K key, V value) {
    return value;
  }
}
{code}


Thoughts?

> Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-370
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-370
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-370) Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.

Reply via email to