[
https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13500476#comment-13500476
]
Alejandro Abdelnur commented on MAPREDUCE-2454:
-----------------------------------------------
Now following up with Arun's concern on 'passing a shuffle down to the merger',
after spending some extra time looking at the code with and without the patch.
I agree with Asokan's arguments on why the shuffle ought to be passed to the
merger as in the latest patch.
It is a clear separation of concerns, the shuffle only shuffles data without
having to be aware of how that data is handled afterwards.
The change does not change the end behavior of the shuffle-merge phase, thus it
does not break any existing MR application. Nor it can break any existing
Hadoop plugin (as all this was hardcoded and it could not be replaced).
Also, the change does not preclude in the future implementing things like a
push shuffle.
Regarding Arun's suggestion:
bq. It's trivial to return an iterator from a copy-only shuffle which is backed
by a blocking shuffle which waits till any (not all) key/value pairs have been
shuffled over the network.
This would require changes in the shuffle, which could significantly increase
the scope of work of this JIRA. On the other hand, the latest patch does not
modify the Shuffle.
My take here is along the lines of Arun's comment:
bq. I've spent sometime thinking about this - and I feel we can do something
far simpler to address Syncsort's goal of plugging in your proprietary sort
while mitigating risk to MR itself....How about this: I feel we could
accomplish both goals by something very simple..
Echoing Arun, we are mitigating risk while enabling the desired functionality.
> Allow external sorter plugin for MR
> -----------------------------------
>
> Key: MAPREDUCE-2454
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.2-alpha
> Reporter: Mariappan Asokan
> Assignee: Mariappan Asokan
> Priority: Minor
> Labels: features, performance, plugin, sort
> Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf,
> KeyValueIterator.java, MapOutputSorterAbstract.java, MapOutputSorter.java,
> mapreduce-2454-modified-code.patch, mapreduce-2454-modified-test.patch,
> mapreduce-2454-new-test.patch, mapreduce-2454.patch, mapreduce-2454.patch,
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch,
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch,
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch,
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch,
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch,
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch,
> mapreduce-2454.patch, mapreduce-2454-protection-change.patch,
> mr-2454-on-mr-279-build82.patch.gz, MR-2454-trunkPatchPreview.gz,
> ReduceInputSorter.java
>
>
> Define interfaces and some abstract classes in the Hadoop framework to
> facilitate external sorter plugins both on the Map and Reduce sides.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira