[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029349#comment-13029349
 ] 

Mariappan Asokan commented on MAPREDUCE-2454:
---------------------------------------------

Hi Steve,
    Thank you very much for your comments.  I will try to make the sorting done 
on Map and Reduce side as pluggable.  The default implementation will be 
whatever is available in the framework.  It is easy to separate the sorting 
process on the Map side(currently all the code is in the class MapOutputBuffer 
which lives in MapTask.java.)  It is very hard to separate the merge on the 
Reduce side because of the way it is coded.  I am working to separate that as 
well.

Regarding GNU sort plugin, I am making the external sort command name 
configurable.  It can be POSIX sort command as well.  Since most Hadoop 
installations are Linux based, GNU sort is available as the POSIX sort 
implementation.  Other UNIX installations can use the POSIX sort command as an 
external sorter.  There is no GPL issue.  Perhaps, I can remove the word GNU 
and just call it UNIX.

Regarding class loader related exceptions: I will look at framework's code and 
see what it does when it loads a Mapper or Reducer class and follow the same 
since the scenario is very similar.  All issues you have raised w.r.t class 
loading are applicable there as well.

An explanation on UnsupportedOperationException:  If the external sorter uses a 
UNIX command like sort, it may not be able to handle a custom key type user has 
defined since the key comparator may be written in Java.  In such a case there 
will be message logged in syslog and the framework's sorter will be used.  I 
think this is fair enough.  Please let me know if you think otherwise.

When I am done with the implementation(on top of MAPREDUCE-279) and testing, I 
will post a patch file for review.  Would you be interested to work with me as 
a committer?

Thank you.

> Allow external sorter plugin for MR
> -----------------------------------
>
>                 Key: MAPREDUCE-2454
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Mariappan Asokan
>            Priority: Minor
>         Attachments: KeyValueIterator.java, MapOutputSorter.java, 
> MapOutputSorterAbstract.java, ReduceInputSorter.java
>
>
> Define interfaces and some abstract classes in the Hadoop framework to 
> facilitate external sorter plugins both on the Map and Reduce sides.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to