[jira] Commented: (HADOOP-939) No-sort optimization

Doug Cutting (JIRA) Thu, 25 Jan 2007 17:22:10 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467717
 ]


Doug Cutting commented on HADOOP-939:
-------------------------------------

I suspect that most of the performance gains to be had by declaring input to be 
sorted can also be had by using heuristics that also speed things when input is 
only nearly sorted.  (By "nearly sorted" I mean things like merging a set of 
updates into a sorted database, e.g., the crawl db update task in Nutch.)

Eric Baldeschwieler proposed a simple model for MapReduce performance.  If you 
assume that disks can read and write at 100MB/s, and that nodes can talk within 
rack at 100MB/s (Gb/s) and to nodes in another rack at 10MB/s, then a MapReduce 
requires the following number of seconds per 100MB.  (Note that this assumes 
various sort optimizations that are already in progress, where map outputs are 
buffered and sorted before they're spilled to the local disk on map nodes, and 
reduce inputs are buffered and merged before they're spilled to the local disk 
on the reduce node, so that, in many cases, reduce can proceed without an 
explicit sort stage but simply by merging a set of already sorted input files 
from the local disk.)

a.  1 read input data from local drive on map node
[ map ]
b.  1 write batches of sorted output data to temporary file on map node
c. 10 shuffle batches of sorted data to reduce node
d.  1 write batches of sorted data to reduce node
[ reduce]
e.  1 write one copy of output locally
f.  2 transfer and write one copy to another node on the same rack
g. 11 transfer and write one copy to an off-rack node

So the total is 27s/100MB.  Only two of those are really sort-specific, (b) and 
(d).  14 (more than half) are unavoidable.

The biggest chunk of fat to go after for pre-sorted input is (c).  This can be 
eliminated if maps can be placed near reduces.  For example, tasktrackers might 
report the size of each partition they're generating and the jobtracker might 
use this to schedule reduces on racks which already have a lot of their input.


> No-sort optimization
> --------------------
>
>                 Key: HADOOP-939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-939
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>         Environment: all
>            Reporter: Doug Judd
>
> There should be a way to tell the mapred framework that the output of the 
> map() phase will already be sorted.  The Reduce phase can just merge the 
> intermediate files together without sorting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-939) No-sort optimization

Reply via email to