[
https://issues.apache.org/jira/browse/HADOOP-4888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657310#action_12657310
]
Chris Douglas commented on HADOOP-4888:
---------------------------------------
HttpClient with the current patch actually degraded performance in five runs of
a shuffle benchmark on trunk.
498 nodes, 256MB/map, 495 maps, no map-side merge, half of reduce input from
memory, no intermediate compression.
|| Version || 1 || 2 || 3 || 4 || 5 || avg || std.d ||
| r727228 | 406 | 485 | 360 | 448 | 411 | 422 | 48 |
| r727228 + patch | 418 | 357 | 501 | 446 | 442 | 433 | 52 |
Stragglers were dominant. In both versions, output from the final few maps held
up the reduce phase, so neither could distinguish itself with better
throughput, connection reuse, protocol efficiency, etc. Larger benchmarks that
might compensate for these effects, such as gridmix, cannot be run on available
nodes.
> Use Apache HttpClient for fetching map outputs
> ----------------------------------------------
>
> Key: HADOOP-4888
> URL: https://issues.apache.org/jira/browse/HADOOP-4888
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: Chris Douglas
> Assignee: Chris Douglas
> Attachments: 4888-0.patch
>
>
> It's worth experimenting with the
> [HttpClient|http://hc.apache.org/httpclient-3.x/] library to speed up the
> shuffle.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.