Hi Xu,
Thanks for the proposal. But I don't quite understand your new algorithm.
1. The step #2 #3 you mentioned are the built-in features of MapReduce 
computation framework (Partition, sort and spill). 2. I remember we've tried 
this one-stage method for cube building in our POC phase. The problem of it is 
that each mapper will generate too much intermediate data, and the network will 
be the bottleneck in Shuffle phase  
ThanksGeorge Song

> From: [email protected]
> To: [email protected]
> Subject: proposal of cube building optimization
> Date: Sat, 28 Feb 2015 11:24:59 +0800
> 
> Hi Guys,
> 
> 
> Now the cube is built by multi-stage map-reduce job. It may introduce 
> unnecessary latency for some cases (e.g. incremental building). 
> 
> 
> We can introduce another cube building algorithm as below:
> 1. When the mapper process the raw record, it will generate all the valid 
> combination record that will be put into memory.
> 2. When memory is almost full, mapper will write all the combination records 
> to reducer. 
> 3. After mapper write the records to reduce, it will cleanup the memory for 
> further process.
> 
> 
> Basically, mapper logically split the data block by memory limitation.
> 
> 
> Thanks
> Jiang Xu
                                          

Reply via email to