Folks,
I am not sure that understnad the issue, but its very interesting to get what you are talking about. Could you please give some more description of the problem.
Thanks,
DT
----- Original Message ----- From: "Doug Cutting" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Friday, November 09, 2007 8:20 AM
Subject: Re: Tech Talk: Dryad


Stu Hood wrote:
The slide comparing the time taken to spill to disk between vertices vs operating purely in memory (around minute 26) is definitely something to think about.

I have not had a chance to watch the video yet, but, in MapReduce, if the intermediate dataset is larger than the RAM on your cluster, then you must spill to disk in order to sort. (When it is smaller, then we should of course avoid disk. but that's not the typical case.) If you don't sort, then it's just map, and piping a sequence of maps together is trivial to do on the same host, no need to even move the data over the wire. So I don't yet see the direct relevance. What am I missing? (Maybe I should watch the video...)

Doug


Reply via email to