[ https://issues.apache.org/jira/browse/HADOOP-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647781#action_12647781 ]
Jothi Padmanabhan commented on HADOOP-3293: ------------------------------------------- Sorry, I should have said "Patch for review"; the Patch was locally tested. I also did a test to demonstrate the performance improvement from the patch. I allocated a 440 node cluster, ran randomwriter with 40 maps, each map output 25G. I then killed the task trackers on the nodes that ran the maps. I then ran a modified sort (no map output, no reduces) with a minimum input split of 10G. If found that, over an average of three runs, patch was about 17 seconds faster than the trunk (175 secs as opposed to 192 secs) > When an input split spans cross block boundary, the split location should be > the host having most of bytes on it. > ------------------------------------------------------------------------------------------------------------------ > > Key: HADOOP-3293 > URL: https://issues.apache.org/jira/browse/HADOOP-3293 > Project: Hadoop Core > Issue Type: Bug > Components: mapred > Reporter: Runping Qi > Assignee: Jothi Padmanabhan > Attachments: hadoop-3293.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.