[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557508#comment-13557508 ]
Arun C Murthy commented on MAPREDUCE-4808: ------------------------------------------ bq. I will try to explain a simple use case of an external implementation of merge on the reduce side. Let us say this merge implementation has some fixed area of memory (Java byte array) allocated to store the shuffled data. This may be done to avoid frequent garbage collection by JVM or for better processor cache efficiency. Asokan - this is the first time I've heard this use case which seems something Syncsort can take advantage of, and, as a consequence, I've been viewing from the lens of 'limit-N/hash-join' merge etc. In future, being clear and upfront about use-cases will obviously prevent further such confusion. ---- Having said that, I still feel a better approach would be to use a custom shuffle via MAPREDUCE-4049 and friends since you get more control - for e.g. you might want to defer shuffle based on memory on the heap (byte[]) and memory outside heap (JNI or DirectBuffers) for Syncsort plugin - and clearly, the current MergeManager will not suffice for such. However, if this unblocks you in the short run I think the approach is fine. Thanks for the clarification. I'll take another look at the details on the patch once you upload it, but seem mostly fine to me. Thanks. > Refactor MapOutput and MergeManager to facilitate reuse by Shuffle > implementations > ---------------------------------------------------------------------------------- > > Key: MAPREDUCE-4808 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Reporter: Arun C Murthy > Assignee: Mariappan Asokan > Attachments: COMBO-mapreduce-4809-4812-4808.patch, > mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, > mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, > mapreduce-4808.patch, MergeManagerPlugin.pdf, MR-4808.patch > > > Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for > alternate implementations to be able to reuse portions of the default > implementation. > This would come with the strong caveat that these classes are LimitedPrivate > and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira