[
http://issues.apache.org/jira/browse/HADOOP-195?page=comments#action_12378080 ]
Owen O'Malley commented on HADOOP-195:
--------------------------------------
Looking at the logs of my sort benchmark on 188 nodes, each reduce is fetching
and processing 1 gig of data from ~64k maps. After all of the maps are done, a
reduce takes 8 hours to run. 7 of those hours are in fetching the map outputs.
Timing of calls to getFile that complete (average ~15k bytes):
Max: 76 seconds
Avg: 385 ms
Mean: 45 ms
Distribution (count, int(log_2(ms))):
149 0
41449 1
96305 2
101595 3
13060775 4
71232197 5
9675008 6
4569403 7
5688185 8
5196811 9
3878971 10
4267733 11
1855209 12
411456 13
70594 14
24182 15
1 16
Timeouts from getFile: 29120
So the reduce prepare is being dominated by calls to getFile (64k * 385 = 6.8
hours).
For a first pass, I'll try increasing the number of threads serving data to 20
(from 2) and try the parallel rpc call to fetch 5 files at a time.
Thoughts?
> transfer map output transfer with http instead of rpc
> -----------------------------------------------------
>
> Key: HADOOP-195
> URL: http://issues.apache.org/jira/browse/HADOOP-195
> Project: Hadoop
> Type: Improvement
> Components: mapred
> Versions: 0.2
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
> Fix For: 0.3
>
> The data transfer of the map output should be transfered via http instead
> rpc, because rpc is very slow for this application and the timeout behavior
> is suboptimal. (server sends data and client ignores it because it took more
> than 10 seconds to be received.)
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira