[ 
http://issues.apache.org/jira/browse/HADOOP-195?page=comments#action_12378080 ] 

Owen O'Malley commented on HADOOP-195:
--------------------------------------

Looking at the logs of my sort benchmark on 188 nodes, each reduce is fetching 
and processing 1 gig of data from ~64k maps. After all of the maps are done, a 
reduce takes 8 hours to run. 7 of those hours are in fetching the map outputs.

Timing of calls to getFile that complete (average ~15k bytes):
   Max: 76 seconds
   Avg: 385 ms
   Mean: 45 ms
   Distribution (count, int(log_2(ms))):
     149 0
   41449 1
   96305 2
  101595 3
13060775 4
71232197 5
 9675008 6
 4569403 7
 5688185 8
 5196811 9
 3878971 10
 4267733 11
 1855209 12
  411456 13
   70594 14
   24182 15
       1 16

Timeouts from getFile: 29120

So the reduce prepare is being dominated by calls to getFile (64k * 385 = 6.8 
hours).

For a first pass, I'll try increasing the number of threads serving data to 20 
(from 2) and try the parallel rpc call to fetch 5 files at a time.

Thoughts?

> transfer map output transfer with http instead of rpc
> -----------------------------------------------------
>
>          Key: HADOOP-195
>          URL: http://issues.apache.org/jira/browse/HADOOP-195
>      Project: Hadoop
>         Type: Improvement

>   Components: mapred
>     Versions: 0.2
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.3

>
> The data transfer of the map output should be transfered via http instead 
> rpc, because rpc is very slow for this application and the timeout behavior 
> is suboptimal. (server sends data and client ignores it because it took more 
> than 10 seconds to be received.)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to