[
https://issues.apache.org/jira/browse/KYLIN-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke Han updated KYLIN-607:
---------------------------
Sprint: Sprint 42, Sprint 43, Sprint 44, Sprint 45, Sprint 46 (was: Sprint
42, Sprint 43, Sprint 44, Sprint 45)
> More efficient cube building
> ----------------------------
>
> Key: KYLIN-607
> URL: https://issues.apache.org/jira/browse/KYLIN-607
> Project: Kylin
> Issue Type: New Feature
> Components: Job Engine
> Reporter: liyang
> Assignee: liyang
> Fix For: v0.8.1
>
>
> Right now cube building is by layer of spanning trees. The algorithm results
> a total shuffle size around [Avg Cardinality] * [Total Cube Size]. This is
> the current biggest bottleneck of cube building in eBay deployment.
> Propose a different algorithm:
> 1. Each mapper builds a cube segment independent, and output.
> 2. One round of shuffle merge sorts the segments.
> 3. Reducer outputs the final merged cube.
> This could achieve 1 * [Total Cube Size] shuffling when there's a mandatory
> dimension and each mapper takes a different piece on the dimension. E.g.
> month is mandatory and each mapper is assign a different month data.
> This algorithm is also more friendly to streaming.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)