Re: Tech Talk: Dryad

Doug Cutting Fri, 09 Nov 2007 08:20:59 -0800

Stu Hood wrote:

The slide comparing the time taken to spill to disk between vertices vs 
operating purely in memory (around minute 26) is definitely something to think 
about.

I have not had a chance to watch the video yet, but, in MapReduce, ifthe intermediate dataset is larger than the RAM on your cluster, thenyou must spill to disk in order to sort. (When it is smaller, then weshould of course avoid disk. but that's not the typical case.) If youdon't sort, then it's just map, and piping a sequence of maps togetheris trivial to do on the same host, no need to even move the data overthe wire. So I don't yet see the direct relevance. What am I missing?(Maybe I should watch the video...)


Doug

Re: Tech Talk: Dryad

Reply via email to