[
https://issues.apache.org/jira/browse/KYLIN-2826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhong Yanghong updated KYLIN-2826:
----------------------------------
Fix Version/s: v2.2.0
> Add basic support classes for cube planner algorithms
> -----------------------------------------------------
>
> Key: KYLIN-2826
> URL: https://issues.apache.org/jira/browse/KYLIN-2826
> Project: Kylin
> Issue Type: Sub-task
> Affects Versions: v2.1.0
> Reporter: Zhong Yanghong
> Assignee: Zhong Yanghong
> Fix For: v2.2.0
>
>
> Cube planner aims at recommending cost-effective cuboids. Currently we only
> consider {color:#f79232}*scanned row count*{color} at {color:#f79232}*query
> phase*{color} for the cost. The related formula is as follows:
> bq. cuboid cost = scanned row count on target cuboid * query probability
> As we know the base cuboid is to be prebuilt absolutely. If only the base
> cuboid is prebuilt, for other cuboids, the target cuboid will be the base
> cuboid and the _(scanned row count)_ is supposed to be large. When another
> cuboid is selected to be prebuilt, for its descendant cuboids including
> itself, it will be their target cuboid and the _(scanned row count)_ is
> supposed to become smaller. Thus, this newly cuboid will bring some benefit.
> We employ BPUS (benefit per unit space) for cuboid selection. The related
> formula for the benefit of a cuboid is as follows:
> bq. cuboid benefit = total reduced cuboid cost) / (cuboid row count)
> Cuboid selection is based on one basic rule:
> bq. {color:#f79232}*RULE 1: Cuboids with more benefit will be
> preferred.*{color}
> For a cube, cube planner can be used in two phases.
> * Phase one is for cube normal building.
> To use cube planner for this phase, the cube should be empty or the building
> job is for refreshing the only one segment. In this phase, we regard each
> cuboid own the same _(query probability)_ due to lack of query statistics.
> * Phase two is for cube optimization.
> Currently cube optimization is manually triggered. _(query probability)_ will
> be considered and its related query statistics are fetched from system cubes.
> Based on _(query probability)_, it's possible for us to add missing cuboids
> without cuboid row count info. It's based on a rule, called
> {color:#f79232}*mandatory rule*{color}.
> bq. {color:#f79232}*RULE 2: A cuboid not pre-built should be added, if it's
> queried frequently and the average rollup row count from its pre-built parent
> cuboid is large.*{color}
> From above introduction, we know cube planner is based on statistics,
> including cuboid row count, cuboid hit frequency, etc. Class {{CuboidStats}}
> is introduced to provide these info for related algorithm.
> Here, we also define the interface {{CuboidRecommendAlgorithm}} for different
> kinds of cube planner algorithms. As we know, if there's no space limitation,
> to pre-build all of the cuboids will bring the most benefit. However, it's
> not feasible in real world. Then with space limitation, an interface is
> defined to recommend a set of high benefit cuboids.
> {code}
> List<Long> recommend(double expansionRate);
> {code}
> Here, the expansion rate is compared to the size of base cuboid.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)