On Mon, Mar 2, 2015 at 6:47 AM, 宋轶 <[email protected]> wrote: > The problem of it is that each mapper will generate too much intermediate > data, and the network will be the bottleneck in Shuffle phase
This would prevent multiple passes over the input data. Is there a difference in the amount of shuffled data from the amount that would be shuffled by multiple map-reduce steps?
