[ 
https://issues.apache.org/jira/browse/HADOOP-4888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657310#action_12657310
 ] 

Chris Douglas commented on HADOOP-4888:
---------------------------------------

HttpClient with the current patch actually degraded performance in five runs of 
a shuffle benchmark on trunk.

498 nodes, 256MB/map, 495 maps, no map-side merge, half of reduce input from 
memory, no intermediate compression.

|| Version || 1 || 2 || 3 || 4 || 5 || avg || std.d ||
| r727228 | 406 | 485 | 360 | 448 | 411 | 422 | 48 |
| r727228 + patch | 418 | 357 | 501 | 446 | 442 | 433 |  52 |

Stragglers were dominant. In both versions, output from the final few maps held 
up the reduce phase, so neither could distinguish itself with better 
throughput, connection reuse, protocol efficiency, etc. Larger benchmarks that 
might compensate for these effects, such as gridmix, cannot be run on available 
nodes.

> Use Apache HttpClient for fetching map outputs
> ----------------------------------------------
>
>                 Key: HADOOP-4888
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4888
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>         Attachments: 4888-0.patch
>
>
> It's worth experimenting with the 
> [HttpClient|http://hc.apache.org/httpclient-3.x/] library to speed up the 
> shuffle.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to