Google has a very interesting tech-talk up about Dryad: Microsoft's distributed execution framework. There has been a paper out about it for a while, but the video has some more information about the ways that the system has been used since it was published.
http://www.youtube.com/watch?v=WPhE5JCP2Ak The slide comparing the time taken to spill to disk between vertices vs operating purely in memory (around minute 26) is definitely something to think about. Higher level frameworks such as HBase and Pig are already being developed on top of the MapReduce primitive, and so allowing the (perennial discussed) 'multi-reduce' concept to sneak in to Hadoop ought to be very attractive (see: http://www.nabble.com/Poly-reduce--tf4313116.html#a13437687 ). I really hope this will help restart the discussion of direct map->reduce links. Thanks, Stu Hood Webmail.us "You manage your business. We'll manage your email."®
