Yea, near the end of his posting, Vuk mentioned that they had considered adding checkpointing (by asynchronously storing the output of a R1|M2 to disk while it was being piped to R2), but didn't get around to it.
Thanks, Stu -----Original Message----- From: Joydeep Sen Sarma <[EMAIL PROTECTED]> Sent: Friday, November 9, 2007 1:12pm To: [email protected], [EMAIL PROTECTED] Subject: RE: Tech Talk: Dryad I think we have to thing harder about how to address the problems with managing errors and keeping track of too much state/rolling back etc. This field is new to me - but I do remember from grad school that checkpointing is a very relevant and researched topic in parallel computing in general (which is really what the commit to hdfs between one reduce and the next map does). (pretty vague - will try to do some reading when I find some time :-)) -----Original Message----- From: Stu Hood [mailto:[EMAIL PROTECTED] Sent: Friday, November 09, 2007 10:03 AM To: [email protected] Subject: Re: Tech Talk: Dryad I did read the conclusion of the previous thread, which was that nobody "thought" that the performance gains would be worth the added complexity. I simply think that if a patch is available, the developer should be encouraged to submit it for review, since the topic has been discussed so frequently. I think our concept of "the" map/reduce primitive has been limited in scope to the capabilities that Google described. There is no reason not to explore potentially beneficial additions (even if Google didn't think they were worthwhile). Yes, Dryad is more confusing, because it is using a more flexible primitive. I'm not suggesting that Hadoop should be rewritten to use a DAG at its core, but we do already have the o.a.h.m.jobcontrol.JobControl module, so _somebody_ must think the concept is useful. Re: Side note: As the presenter explained, he uses a small example first to demonstrate the linear speedup. Next (~32 minute) he goes to an example of sorting 10TB on 1800 machines in ~12 minutes... Thanks, Stu -----Original Message----- From: Owen O'Malley <[EMAIL PROTECTED]> Sent: Friday, November 9, 2007 12:32pm To: [email protected] Subject: Re: Tech Talk: Dryad On Nov 9, 2007, at 8:49 AM, Stu Hood wrote: > Currently there is no sanctioned method of 'piping' the reduce > output of one job directly into the map input of another (although > it has been discussed: see the thread I linked before: http:// > www.nabble.com/Poly-reduce--tf4313116.html ). Did you read the conclusion of the previous thread? The performance gains in avoiding the second map input are trivial compared the gains in simplicity of having a single data path and re-execution story. During a reasonably large job, roughly 98% of your maps are reading data on the _same_ node. Once we put in rack locality, it will be even better. I'd much much rather build the map/reduce primitive and support it very well than add the additional complexity of any sort of poly- reduce. I think it is very appropriate for systems like Pig to include that kind of optimization, but it should not be part of the base framework. I watched the front of the Dryad talk and was struck by how complex it quickly became. It does give the application writer a lot of control, but to do the equivalent of a map/reduce sort with 100k maps and 4k reduces with automatic spill-over to disk during the shuffle seemed _really_ complicated. On a side note, in the part of the talk that I watched, the scaling graph went from 2 to 9 nodes. Hadoop's scaling graphs go to 1000's of nodes. Did they ever suggest later in the talk that it scales up higher? -- Owen
