[ 
https://issues.apache.org/jira/browse/HADOOP-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647781#action_12647781
 ] 

Jothi Padmanabhan commented on HADOOP-3293:
-------------------------------------------

Sorry, I should have said "Patch for review"; the Patch was locally tested. 
I also did a test to demonstrate the performance improvement from the patch. I 
allocated a 440 node cluster, ran randomwriter with 40 maps, each map output 
25G. I then killed the task trackers on the nodes that ran the maps. I then ran 
a modified sort (no map output, no reduces) with a minimum input split of 10G. 
If found that, over an average of three runs, patch was about 17 seconds faster 
than the trunk (175 secs as opposed to 192 secs)

> When an input split spans cross block boundary, the split location should be 
> the host having most of bytes on it. 
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3293
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3293
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Jothi Padmanabhan
>         Attachments: hadoop-3293.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to