[
https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029386#comment-13029386
]
Owen O'Malley commented on MAPREDUCE-2454:
------------------------------------------
Actually, I think I made a mistake in pushing the objects into the interface,
especially since I plan to change the serialization layer. I think it would be
better to do:
{code title=RawRecordWriter}
package org.apache.hadoop.mapreduce.task;
public abstract class RawRecordWriter implements Closeable {
/**
* Called once at start of processing
*/
public abstract void initialize(TaskAttemptContext context
) throws IOException, InterruptedException;
/**
* Called once per a record. The key and value will be copied before write
returns.
*/
public abstract void write(int partition, ByteBuffer key, ByteBuffer value
) throws IOException, InterruptedException;
/**
* Called once at task finish or failure.
*/
public abstract void close() throws IOException;
}
{code}
For the Reduce side, we could just use the RawKeyValueIterator, but I suspect
we'll be in
better shape if we do something similar:
{code title=RawRecordReader.java}
package org.apache.hadoop.mapreduce.task;
public abstract class RawRecordReader implements Closeable {
/**
* Called once at start of processing
*/
public abstract void initialize(TaskAttemptContext context
) throws IOException, InterruptedException;
/**
* Advance to the next record. Returns false when there are no more records.
*/
pubic abstract boolean next() throws IOException, InterruptedException;
/**
* Provides the ByteBuffer with the key. The ByteBuffer may be reused after
each call to
* next.
*/
public abstract ByteBuffer getKey() throws IOException, InterruptedException;
/**
* Provides the ByteBuffer with the value. The ByteBuffer may be reused after
each call to
* next.
*/
public abstract ByteBuffer getValue() throws IOException,
InterruptedException;
/**
* Called once at task finish or failure.
*/
public abstract void close() throws IOException;
}
{code}
This has a couple of advantages:
* The plugin gets the TaskAttemptContext and the configuration.
* Serialization stays part of MapReduce instead of the sort library.
> Allow external sorter plugin for MR
> -----------------------------------
>
> Key: MAPREDUCE-2454
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Reporter: Mariappan Asokan
> Priority: Minor
> Attachments: KeyValueIterator.java, MapOutputSorter.java,
> MapOutputSorterAbstract.java, ReduceInputSorter.java
>
>
> Define interfaces and some abstract classes in the Hadoop framework to
> facilitate external sorter plugins both on the Map and Reduce sides.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira