[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avner BenHanoch updated MAPREDUCE-4049:
---------------------------------------

    Attachment: Hadoop Shuffle Plugin Design.rtf


Attached is the updated design documents.  
For your convenience, I am writing below its essence:

 * Support _ShuffleConsumerPlugin_ from 3rd parties which will be able to 
retrieve maps from either the built-in _ShuffleHandler_ or from a 3rd party 
Shuffle Provider Service (that can be loaded as an _AuxiliaryService_ at run 
time)
 * Update Hadoop code, so the current _Shuffle_ class will be the first (and 
the default) _ShuffleConsumerPlugin_.
 * Allow users to configure their desired _ShuffleConsumerPlugin_ in 
_mapred.xml_ files.
 * Change _ReduceTask_ to *dynamically load* the user configured 
_ShuffleConsumerPlugin_ and use it for its shuffle needs.
 * The interface of _ShuffleConsumerPlugin_ will be inspired by the interface 
of the existing *Shuffle* class.  *Hence, there will be minimal changes in 
current code.*


{code:title=ShuffleConsumerPlugin.java|borderStyle=solid}
/**
 * ShuffleConsumerPlugin for serving Reducers.  It may shuffle MOF files from
 * either the built-in ShuffleHandler or from a 3rd party AuxiliaryService.
 */
public abstract class ShuffleConsumerPlugin<K, V> {
        

  public abstract void init(ShuffleContext<K, V> context);                      
        

  public abstract RawKeyValueIterator run() throws IOException, 
InterruptedException;

  public void close(){}

  /**
   * Factory method for getting a ShuffleConsumerPlugin from the given class
   * object and configuring it.  If clazz is null, this method will return an
   * instance of 'Shuffle' class since it is the default ShuffleConsumerPlugin 
   * 
   * @param clazz class of the requested ShuffleConsumerPlugin
   * @param conf configure the plugin with this
   * @return an instance of ShuffleConsumerPlugin
   */
  public static ShuffleConsumerPlugin getShuffleConsumerPlugin(
    Class<? extends ShuffleConsumerPlugin> clazz, 
    JobConf conf) throws ClassNotFoundException, IOException {
        // ...          
  }
}
{code}
                
> plugin for generic shuffle service
> ----------------------------------
>
>                 Key: MAPREDUCE-4049
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: performance, task, tasktracker
>    Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
>            Reporter: Avner BenHanoch
>              Labels: merge, plugin, rdma, shuffle
>         Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, 
> mapreduce-4049.patch, mapreduce-4049.patch
>
>
> Support generic shuffle service as set of two plugins: ShuffleProvider & 
> ShuffleConsumer.
> This will satisfy the following needs:
> # Better shuffle and merge performance. For example: we are working on 
> shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
> or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
> RDMA shuffle, the plugin can also utilize a suitable merge approach during 
> the intermediate merges. Hence, getting much better performance.
> # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
> dependency of NodeManager with a specific version of mapreduce shuffle 
> (currently targeted to 0.24.0).
> References:
> # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
> from Auburn University with others, 
> [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
> # I am attaching 2 documents with suggested Top Level Design for both plugins 
> (currently, based on 1.0 branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to