Re: [jira] Commented: (HADOOP-939) No-sort optimization

Doug Cutting Mon, 29 Jan 2007 10:18:16 -0800

Arkady Borkovsky wrote:

Does this model assume that the size of the output of reduce is similarto the size of the input?
An important class of applications (mentioned in this thread before)uses two inputs:-- M ("master file") -- very large, presorted and not changing from runto run,-- D ("details file") -- smaller, different from run to run, notnecessarily presorted
and the output size is proportional to the size of D.
In this case the gain from "no-sort" may be much higher, as the 13"transfer and write" to DFS are applied to a smaller amount of data,while 11 (b-d) sort-n-shuffle-related are saved on the larger data).

Could a combiner be used in this hypothetical case? If so, then the b-dsteps might be faster too.


Doug

Re: [jira] Commented: (HADOOP-939) No-sort optimization

Reply via email to