[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030121#comment-13030121
 ] 

Owen O'Malley commented on MAPREDUCE-2454:
------------------------------------------

The map output key and value types are controlled by the application, not the 
framework. A plugin that can only sort Text objects isn't general purpose 
enough. Even streaming created a lot of trouble for the users by requiring 
UTF-8 encoding of the data. 

The only acceptable solution would be to define this API and refactor the 
current code into a default plugin.

I hadn't thought enough about the combiner. It requires an inversion of control 
since the start of the combiner happens based on the spill.

{code:title=SortPlugin}
package org.apache.hadoop.mapreduce.task;

public abstract class SortPlugin {

  public interface CombinerCallback {
    /** Called once for each partition of the map output */
    void runCombiner(RawRecordReader reader,
                     RawRecordWriter writer
                    ) throws IOException, InterruptedException;
  }

  /** Called once in map task for collector to gather
      output coming from map. */
  public abstract RawRecordWriter createRawRecordWriter()
    throws IOException, InterruptedException;

  /** Called once in the map task, if there is a combiner. */
  public abstract void registerCombinerCallback(CombinerCallback callback)
    throws IOException, InterruptedException;

  /** Called once in the reduce task for iterator to provide
      input to the reduce. */ 
  public abstract RawRecordReader createRawRecordReader() 
    throws IOException, InterruptedException;
}
{code}

> Allow external sorter plugin for MR
> -----------------------------------
>
>                 Key: MAPREDUCE-2454
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Mariappan Asokan
>            Priority: Minor
>         Attachments: KeyValueIterator.java, MapOutputSorter.java, 
> MapOutputSorterAbstract.java, ReduceInputSorter.java
>
>
> Define interfaces and some abstract classes in the Hadoop framework to 
> facilitate external sorter plugins both on the Map and Reduce sides.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to