Hi all, I'm just a month into using hadoop too, and it sounds like we are all wishing for this kind of feature.
> 2. Map tasks of the next step are streamed data directly from preceding > reduce tasks. This is more along the lines Ted is suggesting - make > iterative map-reduce a primitive natively supported in Hadoop. This is > probably a better solution - but more work? I would like basically to do the same, with a mandatory condition: without spilling data into temporary files. Keeping in RAM all the files that the reduce outputs would be great in my context. Maybe the solution could be an instance of InMemoryFileSystem? Just passing the reference from the Reduce to the next Map (using an external daemon... that it sounds to me like the only viable pattern to do MapReduce chaining, correct me if I'm wrong)? Would the inramfs distributed on all the nodes? All the working solutions will be greatly appreciated :) I was just supposing, the truth is that I still don't have a clue about it. Regards, -Michele Catasta
