1. One step building is more suitable for incremental building that has small 
data size. Full building on large data set can still use multiple stage 
building.


2. Since mapper will manage memory by itself, it will cache the intermediate 
result in memory as more as possible. Moreover, mapper will do preaggregation 
in memory just like combiner. In this way,  it should reduce the shuffle data 
size.


3. Since it's one step building, the data read size and job schedule latency 
should be much less.


Thanks
Jiang Xu


------------------ ???????? ------------------
??????: Ted Dunning <[email protected]>
????????: 2015??03??02?? 13:52
??????: dev <[email protected]>
????: Re: proposal of cube building optimization



On Mon, Mar 2, 2015 at 6:47 AM, ???? <[email protected]> wrote:

> The problem of it is that each mapper will generate too much intermediate
> data, and the network will be the bottleneck in Shuffle phase


This would prevent multiple passes over the input data.  Is there a
difference in the amount of shuffled data from the amount that would be
shuffled by multiple map-reduce steps?

Reply via email to