[
https://issues.apache.org/jira/browse/MAPREDUCE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mariappan Asokan updated MAPREDUCE-4807:
----------------------------------------
Attachment: mapreduce-4807-4809.patch
Hi Alejandro,
I did not want to create a simple mock plugin test for a couple of reasons:
* An end-to-end test for testing the combo of MAPREDUCE-4807 and MAPREDUCE-4809
will test the full MR data flow.
* This is an interesting test that demonstrates how merge operation can be
supported in Hadoop.
Currently, you can do only sort even if you have multiple input files that are
already sorted and you want to merge them. The MapOutputCollector plugin in
the test will route the <key, value> pairs to proper partition such that sort
order is still kept within each partition. This will speed up the map tasks
since O(NlogN) time complexity is reduced to O(N) for a merge.
The reduce tasks will still incur O(NlogN) time in the merge though.
There is one caveat: the test may make the patch size slightly big. I think it
is worth.
Please review and give your feedback.
Thanks.
-- Asokan
> Allow MapOutputBuffer to be pluggable
> -------------------------------------
>
> Key: MAPREDUCE-4807
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4807
> Project: Hadoop Map/Reduce
> Issue Type: Sub-task
> Affects Versions: 2.0.2-alpha
> Reporter: Arun C Murthy
> Assignee: Mariappan Asokan
> Fix For: 2.0.3-alpha
>
> Attachments: mapreduce-4807-4809.patch, mapreduce-4807.patch,
> mapreduce-4807.patch, mapreduce-4807.patch
>
>
> Allow MapOutputBuffer to be pluggable
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira