[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557508#comment-13557508
 ] 

Arun C Murthy commented on MAPREDUCE-4808:
------------------------------------------

bq. I will try to explain a simple use case of an external implementation of 
merge on the reduce side. Let us say this merge implementation has some fixed 
area of memory (Java byte array) allocated to store the shuffled data. This may 
be done to avoid frequent garbage collection by JVM or for better processor 
cache efficiency.

Asokan - this is the first time I've heard this use case which seems something 
Syncsort can take advantage of, and, as a consequence, I've been viewing from 
the lens of 'limit-N/hash-join' merge etc.

In future, being clear and upfront about use-cases will obviously prevent 
further such confusion.

----

Having said that, I still feel a better approach would be to use a custom 
shuffle via MAPREDUCE-4049 and friends since you get more control - for e.g. 
you might want to defer shuffle based on memory on the heap (byte[]) and memory 
outside heap (JNI or DirectBuffers) for Syncsort plugin - and clearly, the 
current MergeManager will not suffice for such.

However, if this unblocks you in the short run I think the approach is fine. 
Thanks for the clarification. I'll take another look at the details on the 
patch once you upload it, but seem mostly fine to me. Thanks.
                
> Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
> implementations
> ----------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4808
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Arun C Murthy
>            Assignee: Mariappan Asokan
>         Attachments: COMBO-mapreduce-4809-4812-4808.patch, 
> mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
> mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
> mapreduce-4808.patch, MergeManagerPlugin.pdf, MR-4808.patch
>
>
> Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
> alternate implementations to be able to reuse portions of the default 
> implementation. 
> This would come with the strong caveat that these classes are LimitedPrivate 
> and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to