[ https://issues.apache.org/jira/browse/HADOOP-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12710271#action_12710271 ]
dhruba borthakur commented on HADOOP-5831: ------------------------------------------ This looks a really impressive performance gain! awesome. > Implement memory-to-memory merge in the reduce > ---------------------------------------------- > > Key: HADOOP-5831 > URL: https://issues.apache.org/jira/browse/HADOOP-5831 > Project: Hadoop Core > Issue Type: Improvement > Components: mapred > Reporter: Arun C Murthy > Assignee: Arun C Murthy > Fix For: 0.21.0 > > > HADOOP-3446 fixed the reduce to not flush the in-memory shuffled map-outputs > before feeding to the reduce. However for latency-sensitive applications with > lots of memory like the terasort this hurts performance since the fan-in for > the final in-memory merge is too large (all 8000 map-outputs very in-memory) > resulting in less than optimal performance. > When I put in an intermediate memory-to-memory merge for the terasort's > reduce (there-by avoiding disk i/o) to cut the fan-in from 8000 to <100 the > 'reduce' phase (including the local datanode-write) sped-up 250% (from 10s to > 4s). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.