1. One step building is more suitable for incremental building that has small data size. Full building on large data set can still use multiple stage building.
2. Since mapper will manage memory by itself, it will cache the intermediate result in memory as more as possible. Moreover, mapper will do preaggregation in memory just like combiner. In this way, it should reduce the shuffle data size. 3. Since it's one step building, the data read size and job schedule latency should be much less. Thanks Jiang Xu ------------------ ???????? ------------------ ??????: Ted Dunning <[email protected]> ????????: 2015??03??02?? 13:52 ??????: dev <[email protected]> ????: Re: proposal of cube building optimization On Mon, Mar 2, 2015 at 6:47 AM, ???? <[email protected]> wrote: > The problem of it is that each mapper will generate too much intermediate > data, and the network will be the bottleneck in Shuffle phase This would prevent multiple passes over the input data. Is there a difference in the amount of shuffled data from the amount that would be shuffled by multiple map-reduce steps?
