On Nov 9, 2007, at 8:49 AM, Stu Hood wrote:
Currently there is no sanctioned method of 'piping' the reduce output of one job directly into the map input of another (although it has been discussed: see the thread I linked before: http:// www.nabble.com/Poly-reduce--tf4313116.html ).
Did you read the conclusion of the previous thread? The performance gains in avoiding the second map input are trivial compared the gains in simplicity of having a single data path and re-execution story. During a reasonably large job, roughly 98% of your maps are reading data on the _same_ node. Once we put in rack locality, it will be even better.
I'd much much rather build the map/reduce primitive and support it very well than add the additional complexity of any sort of poly- reduce. I think it is very appropriate for systems like Pig to include that kind of optimization, but it should not be part of the base framework.
I watched the front of the Dryad talk and was struck by how complex it quickly became. It does give the application writer a lot of control, but to do the equivalent of a map/reduce sort with 100k maps and 4k reduces with automatic spill-over to disk during the shuffle seemed _really_ complicated.
On a side note, in the part of the talk that I watched, the scaling graph went from 2 to 9 nodes. Hadoop's scaling graphs go to 1000's of nodes. Did they ever suggest later in the talk that it scales up higher?
-- Owen
