Folks,
I am not sure that understnad the issue, but its very interesting to get
what you are talking about. Could you please give some more description of
the problem.
Thanks,
DT
----- Original Message -----
From: "Doug Cutting" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Friday, November 09, 2007 8:20 AM
Subject: Re: Tech Talk: Dryad
Stu Hood wrote:
The slide comparing the time taken to spill to disk between vertices vs
operating purely in memory (around minute 26) is definitely something to
think about.
I have not had a chance to watch the video yet, but, in MapReduce, if the
intermediate dataset is larger than the RAM on your cluster, then you must
spill to disk in order to sort. (When it is smaller, then we should of
course avoid disk. but that's not the typical case.) If you don't sort,
then it's just map, and piping a sequence of maps together is trivial to
do on the same host, no need to even move the data over the wire. So I
don't yet see the direct relevance. What am I missing? (Maybe I should
watch the video...)
Doug