[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-4808:
-------------------------------------

    Status: Open  (was: Patch Available)

Asokan, sorry I've been away traveling home during the holidays and hence the 
delay.

I have more comments, but I'll put some here to keep the discussion going.

Thanks for the design doc, but I was looking for thoughts on *how* the plugin 
was going used for use-cases you've mentioned (hash-join etc.), alternatives on 
design etc. 

IAC, taking a step back, the 'goal' here is to make the 'merge' pluggable.

Reduce-side has 2 pieces:
# Shuffle - Move data from maps to the reduce.
# Merge - Merge already sorted map-outputs.

The rest (MergeManager etc.) are merely implementation details to manage memory 
etc., which are irrelevant in several scenarios as soon as we consider 
alternatives to the current HTTP-based shuffle (several alternatives exist such 
RDMA etc.).

Your current approach tries to encapsulate and enshrine the current 
implementation of the reduce task, which I'm not wild about. By this I mean, 
you are focussing too much on the current state and trying to make interfaces 
which are unnecessary for now and might not suffice for the future.

I really don't think we should be tying Shuffle & Merge as you have done by 
introducing yet another new interface (regardless of whether it's public or 
not).


As I've noted above, adding a simple 'Merge' interface with one 'merge' call 
will address all of the use-cases you have outlined. If not, let's discuss.

                
> Allow reduce-side merge to be pluggable
> ---------------------------------------
>
>                 Key: MAPREDUCE-4808
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 2.0.2-alpha
>            Reporter: Arun C Murthy
>            Assignee: Mariappan Asokan
>             Fix For: 2.0.3-alpha
>
>         Attachments: COMBO-mapreduce-4809-4812-4808.patch, 
> mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
> mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf
>
>
> Allow reduce-side merge to be pluggable for MAPREDUCE-2454

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to