[ 
https://issues.apache.org/jira/browse/KYLIN-2826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhong Yanghong updated KYLIN-2826:
----------------------------------
    Fix Version/s: v2.2.0

> Add basic support classes for cube planner algorithms
> -----------------------------------------------------
>
>                 Key: KYLIN-2826
>                 URL: https://issues.apache.org/jira/browse/KYLIN-2826
>             Project: Kylin
>          Issue Type: Sub-task
>    Affects Versions: v2.1.0
>            Reporter: Zhong Yanghong
>            Assignee: Zhong Yanghong
>             Fix For: v2.2.0
>
>
> Cube planner aims at recommending cost-effective cuboids. Currently we only 
> consider {color:#f79232}*scanned row count*{color} at {color:#f79232}*query 
> phase*{color} for the cost. The related formula is as follows:
> bq. cuboid cost = scanned row count on target cuboid * query probability
> As we know the base cuboid is to be prebuilt absolutely. If only the base 
> cuboid is prebuilt, for other cuboids, the target cuboid will be the base 
> cuboid and the _(scanned row count)_ is supposed to be large. When another 
> cuboid is selected to be prebuilt, for its descendant cuboids including 
> itself, it will be their target cuboid and the _(scanned row count)_ is 
> supposed to become smaller. Thus, this newly cuboid will bring some benefit. 
> We employ BPUS (benefit per unit space) for cuboid selection. The related 
> formula for the benefit of a cuboid is as follows:
> bq. cuboid benefit = total reduced cuboid cost) / (cuboid row count)
> Cuboid selection is based on one basic rule:
> bq. {color:#f79232}*RULE 1: Cuboids with more benefit will be 
> preferred.*{color}
> For a cube, cube planner can be used in two phases.
> * Phase one is for cube normal building.
> To use cube planner for this phase, the cube should be empty or the building 
> job is for refreshing the only one segment. In this phase, we regard each 
> cuboid own the same _(query probability)_ due to lack of query statistics.
> * Phase two is for cube optimization.
> Currently cube optimization is manually triggered. _(query probability)_ will 
> be considered and its related query statistics are fetched from system cubes. 
> Based on _(query probability)_, it's possible for us to add missing cuboids 
> without cuboid row count info. It's based on a rule, called 
> {color:#f79232}*mandatory rule*{color}.
> bq. {color:#f79232}*RULE 2: A cuboid not pre-built should be added, if it's 
> queried frequently and the average rollup row count from its pre-built parent 
> cuboid is large.*{color}
> From above introduction, we know cube planner is based on statistics, 
> including cuboid row count, cuboid hit frequency, etc. Class {{CuboidStats}} 
> is introduced to provide these info for related algorithm. 
> Here, we also define the interface {{CuboidRecommendAlgorithm}} for different 
> kinds of cube planner algorithms. As we know, if there's no space limitation, 
> to pre-build all of the cuboids will bring the most benefit. However, it's 
> not feasible in real world. Then with space limitation, an interface is 
> defined to recommend a set of high benefit cuboids.
> {code}
> List<Long> recommend(double expansionRate);
> {code}
> Here, the expansion rate is compared to the size of base cuboid.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to