[ 
https://issues.apache.org/jira/browse/KYLIN-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyang updated KYLIN-607:
-------------------------
    Summary: More efficient cube building  (was: More efficient in mem cubing)

> More efficient cube building
> ----------------------------
>
>                 Key: KYLIN-607
>                 URL: https://issues.apache.org/jira/browse/KYLIN-607
>             Project: Kylin
>          Issue Type: New Feature
>          Components: Job Engine
>            Reporter: liyang
>            Assignee: liyang
>             Fix For: v0.8.1
>
>
> Right now cube building is by layer of spanning trees. The algorithm results 
> a total shuffle size around [Avg Cardinality] * [Total Cube Size]. This is 
> the current biggest bottleneck of cube building in eBay deployment.
> Propose a different algorithm:
> 1. Each mapper builds a cube segment independent, and output.
> 2. One round of shuffle merge sorts the segments.
> 3. Reducer outputs the final merged cube.
> This could achieve 1 * [Total Cube Size] shuffling when there's a mandatory 
> dimension and each mapper takes a different piece on the dimension. E.g. 
> month is mandatory and each mapper is assign a different month data.
> This algorithm is also more friendly to streaming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to