[jira] [Commented] (MAPREDUCE-2454) Allow external sorter plugin for MR

Owen O'Malley (JIRA) Thu, 05 May 2011 08:47:46 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029386#comment-13029386
 ]


Owen O'Malley commented on MAPREDUCE-2454:
------------------------------------------

Actually, I think I made a mistake in pushing the objects into the interface, 
especially since I plan to change the serialization layer. I think it would be 
better to do:

{code title=RawRecordWriter}
package org.apache.hadoop.mapreduce.task;

public abstract class RawRecordWriter implements Closeable {
  /**
   * Called once at start of processing
   */
  public abstract void initialize(TaskAttemptContext context
                                  ) throws IOException, InterruptedException;

  /**
   * Called once per a record. The key and value will be copied before write 
returns.
   */
  public abstract void write(int partition, ByteBuffer key, ByteBuffer value
                             ) throws IOException, InterruptedException;

  /**
   * Called once at task finish or failure.
   */
  public abstract void close() throws IOException;
}
{code}

For the Reduce side, we could just use the RawKeyValueIterator, but I suspect 
we'll be in 
better shape if we do something similar:

{code title=RawRecordReader.java}
package org.apache.hadoop.mapreduce.task;

public abstract class RawRecordReader implements Closeable {
  /**
   * Called once at start of processing
   */
  public abstract void initialize(TaskAttemptContext context
                                  ) throws IOException, InterruptedException;

  /**
   * Advance to the next record. Returns false when there are no more records.
   */
  pubic abstract boolean next() throws IOException, InterruptedException;

  /**
   * Provides the ByteBuffer with the key. The ByteBuffer may be reused after 
each call to
   * next.
   */
  public abstract ByteBuffer getKey() throws IOException, InterruptedException;

  /**
   * Provides the ByteBuffer with the value. The ByteBuffer may be reused after 
each call to
   * next.
   */
  public abstract ByteBuffer getValue() throws IOException, 
InterruptedException;

  /**
   * Called once at task finish or failure.
   */
  public abstract void close() throws IOException;  
}
{code}

This has a couple of advantages:
* The plugin gets the TaskAttemptContext and the configuration.
* Serialization stays part of MapReduce instead of the sort library.

> Allow external sorter plugin for MR
> -----------------------------------
>
>                 Key: MAPREDUCE-2454
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Mariappan Asokan
>            Priority: Minor
>         Attachments: KeyValueIterator.java, MapOutputSorter.java, 
> MapOutputSorterAbstract.java, ReduceInputSorter.java
>
>
> Define interfaces and some abstract classes in the Hadoop framework to 
> facilitate external sorter plugins both on the Map and Reduce sides.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2454) Allow external sorter plugin for MR

Reply via email to