[ http://issues.apache.org/jira/browse/HADOOP-195?page=all ]

Sameer Paranjpye updated HADOOP-195:
------------------------------------

    Attachment: parallel-copiers.txt

Here's a patch that ought to address a lot of the concerns with the copy phase. 
The reduce task now launches a small number of threads (default 5) that fetch 
map outputs in parallel. The difference between this and the parallel RPC is 
that the parallel fetches are independent of each other, as soon as one copy 
finishes the next one starts. The parallel RPC send N requests and wait for 
them all to complete before sending the next N.

Ran Owen's benchmark with this code for a total runtime of 8 hours 53 minutes. 
Sorting 2000GB on 200 nodes, with 10 parallel fetchers per reduce. The 
(map+copy) finished in approximately 2.5 hours, so that the bulk of the time 
was spent in the reduce phase. We saw 41 failures during sort and reduce, which 
pushed out the total runtime.
Tasktracker latency causing pings and progress reports from the child to fail 
appears to be the biggest concern at this point.

The copy phase could be speeded further by improving the hit rate at the 
beginning of the copy. At this time a reduce probes for a random subset of the 
map outputs
and fetches as many as are available, the fetcher threads are mostly idle early 
on, and they don't need to be. Having the job tracker furnish a list of maps 
that have completed since the last query from a tasktracker would go a long way 
towards addressing this.


The number of parallel copiers per reduce is controlled by the config variable 
"mapred.reduce.parallel.copiers" the default is 5 


> transfer map output transfer with http instead of rpc
> -----------------------------------------------------
>
>          Key: HADOOP-195
>          URL: http://issues.apache.org/jira/browse/HADOOP-195
>      Project: Hadoop
>         Type: Improvement

>   Components: mapred
>     Versions: 0.2
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.3
>  Attachments: MapFileSimulator.java, data-transfer-chart.pdf, 
> mapfilesimulator-big.txt, mapfilesimulator-sort2.txt, netstat.log, 
> netstat.xls, parallel-copiers.txt
>
> The data transfer of the map output should be transfered via http instead 
> rpc, because rpc is very slow for this application and the timeout behavior 
> is suboptimal. (server sends data and client ignores it because it took more 
> than 10 seconds to be received.)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to