[jira] [Updated] (KYLIN-2826) Add basic support classes for cube planner algorithms

Zhong Yanghong (JIRA) Thu, 31 Aug 2017 07:41:30 -0700

     [ 
https://issues.apache.org/jira/browse/KYLIN-2826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Zhong Yanghong updated KYLIN-2826:
----------------------------------
    Description: 
Cube planner aims at recommending cost-effective cuboids. Currently we only 
consider {color:#f79232}*scanned row count*{color} at {color:#f79232}*query 
phase*{color} for the cost. The related formula is as follows:
bq. cuboid cost = scanned row count on target cuboid * query probability

As we know the base cuboid is to be prebuilt absolutely. If only the base 
cuboid is prebuilt, for other cuboids, the target cuboid will be the base 
cuboid and the _(scanned row count)_ is supposed to be large. When another 
cuboid is selected to be prebuilt, for its descendant cuboids including itself, 
it will be their target cuboid and the _(scanned row count)_ is supposed to 
become smaller. Thus, this newly cuboid will bring some benefit. We employ BPUS 
(benefit per unit space) for cuboid selection. The related formula for the 
benefit of a cuboid is as follows:
bq. cuboid benefit = total reduced cuboid cost) / (cuboid row count)

Cuboid selection is based on one basic rule:
bq. {color:#f79232}*RULE 1: Cuboids with more benefit will be preferred.*{color}

For a cube, cube planner can be used in two phases.
* Phase one is for cube normal building.
To use cube planner for this phase, the cube should be empty or the building 
job is for refreshing the only one segment. In this phase, we regard each 
cuboid own the same _(query probability)_ due to lack of query statistics.
* Phase two is for cube optimization.
Currently cube optimization is manually triggered. _(query probability)_ will 
be considered and its related query statistics are fetched from system cubes. 
Based on _(query probability)_, it's possible for us to add missing cuboids 
without cuboid row count info. It's based on a rule, called 
{color:#f79232}*mandatory rule*{color}.

bq. {color:#f79232}*RULE 2: A cuboid not pre-built should be added, if it's 
queried frequently and the average rollup row count from its pre-built parent 
cuboid is large.*{color}

>From above introduction, we know cube planner is based on statistics, 
>including cuboid row count, cuboid hit frequency, etc. Class {{CuboidStats}} 
>is introduced to provide these info for related algorithm. 

Here, we also define the interface {{CuboidRecommendAlgorithm}} for different 
kinds of cube planner algorithms. As we know, if there's no space limitation, 
to pre-build all of the cuboids will bring the most benefit. However, it's not 
feasible in real world. Then with space limitation, an interface is defined to 
recommend a set of high benefit cuboids.
{code}
List<Long> recommend(double expansionRate);
{code}
Here, the expansion rate is compared to the size of base cuboid.

  was:
Cube planner aims at recommending cost-effective cuboids. Currently we only 
consider {color:#f79232}*scanned row count*{color} at {color:#f79232}*query 
phase*{color} for the cost. The related formula is as follows:
bq. cuboid cost = scanned row count on target cuboid * query probability

As we know the base cuboid is to be prebuilt absolutely. If only the base 
cuboid is prebuilt, for other cuboids, the target cuboid will be the base 
cuboid and the _(scanned row count)_ is supposed to be large. When another 
cuboid is selected to be prebuilt, for its descendant cuboids including itself, 
it will be their target cuboid and the _(scanned row count)_ is supposed to 
become smaller. Thus, this newly cuboid will bring some benefit. We employ BPUS 
(benefit per unit space) for cuboid selection. The related formula for the 
benefit of a cuboid is as follows:
bq. cuboid benefit = total reduced cuboid cost) / (cuboid row count)

Cuboid selection is based on one basic rule:
bq. {color:#f79232}*RULE: Cuboids with more benefit will be preferred.*{color}



> Add basic support classes for cube planner algorithms
> -----------------------------------------------------
>
>                 Key: KYLIN-2826
>                 URL: https://issues.apache.org/jira/browse/KYLIN-2826
>             Project: Kylin
>          Issue Type: Sub-task
>            Reporter: Zhong Yanghong
>            Assignee: Zhong Yanghong
>
> Cube planner aims at recommending cost-effective cuboids. Currently we only 
> consider {color:#f79232}*scanned row count*{color} at {color:#f79232}*query 
> phase*{color} for the cost. The related formula is as follows:
> bq. cuboid cost = scanned row count on target cuboid * query probability
> As we know the base cuboid is to be prebuilt absolutely. If only the base 
> cuboid is prebuilt, for other cuboids, the target cuboid will be the base 
> cuboid and the _(scanned row count)_ is supposed to be large. When another 
> cuboid is selected to be prebuilt, for its descendant cuboids including 
> itself, it will be their target cuboid and the _(scanned row count)_ is 
> supposed to become smaller. Thus, this newly cuboid will bring some benefit. 
> We employ BPUS (benefit per unit space) for cuboid selection. The related 
> formula for the benefit of a cuboid is as follows:
> bq. cuboid benefit = total reduced cuboid cost) / (cuboid row count)
> Cuboid selection is based on one basic rule:
> bq. {color:#f79232}*RULE 1: Cuboids with more benefit will be 
> preferred.*{color}
> For a cube, cube planner can be used in two phases.
> * Phase one is for cube normal building.
> To use cube planner for this phase, the cube should be empty or the building 
> job is for refreshing the only one segment. In this phase, we regard each 
> cuboid own the same _(query probability)_ due to lack of query statistics.
> * Phase two is for cube optimization.
> Currently cube optimization is manually triggered. _(query probability)_ will 
> be considered and its related query statistics are fetched from system cubes. 
> Based on _(query probability)_, it's possible for us to add missing cuboids 
> without cuboid row count info. It's based on a rule, called 
> {color:#f79232}*mandatory rule*{color}.
> bq. {color:#f79232}*RULE 2: A cuboid not pre-built should be added, if it's 
> queried frequently and the average rollup row count from its pre-built parent 
> cuboid is large.*{color}
> From above introduction, we know cube planner is based on statistics, 
> including cuboid row count, cuboid hit frequency, etc. Class {{CuboidStats}} 
> is introduced to provide these info for related algorithm. 
> Here, we also define the interface {{CuboidRecommendAlgorithm}} for different 
> kinds of cube planner algorithms. As we know, if there's no space limitation, 
> to pre-build all of the cuboids will bring the most benefit. However, it's 
> not feasible in real world. Then with space limitation, an interface is 
> defined to recommend a set of high benefit cuboids.
> {code}
> List<Long> recommend(double expansionRate);
> {code}
> Here, the expansion rate is compared to the size of base cuboid.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KYLIN-2826) Add basic support classes for cube planner algorithms

Reply via email to