[ 
https://issues.apache.org/jira/browse/KYLIN-187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Shi updated KYLIN-187:
----------------------------
    Summary: Data Statistics Collection and Auto Modeling   (was: Data 
Statistics Analyzer )

> Data Statistics Collection and Auto Modeling 
> ---------------------------------------------
>
>                 Key: KYLIN-187
>                 URL: https://issues.apache.org/jira/browse/KYLIN-187
>             Project: Kylin
>          Issue Type: New Feature
>          Components: Tools, Build and Test
>            Reporter: Luke Han
>              Labels: github-import
>             Fix For: Backlog
>
>
> 1 Overview 
> We need the statistics data for the following domains:
> * Design cube metadata based on query log
> * Design HBase row-key based on data distribution (e.g. histogram and 
> cardinality)
> * Choose execution plan based on cuboid data
> 2 Data Analyzer 
> We need to analyzer the hive data and cube data in 2 phases. Firstly, we will 
> analyze the hive to guide the 1st round design of row key. Then we will 
> analyze the cube data to refine the design of row key and to estimate the 
> cost of query.
> 2.1 Analyze Hive Data 
> We need to analyze the following statistics data on hive table:
> * Cardinality of each dimension
> * Cardinality of dimension combination (optional)
> * Value distribution of each dimension (optional)
> Based on the statistics of hive data, we can design row key group from high 
> cardinality dimension to low cardinality dimension. BTW, we should evenly 
> split dimension into the row key group that will reduce the number of cuboid.
> 2.2 Analyze Cube Data 
> We need to analyze the following statistics on data cube:
> * Count of each cuboid
> * Group ratio of each cuboid = current cuboid count / lower group base cuboid 
> count 
> 3 Query Analyzer 
> TBD
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/KylinOLAP/Kylin/issues/318
> Created by: [lukehan|https://github.com/lukehan]
> Labels: newfeature, 
> Milestone: Backlog
> Created at: Fri Dec 26 15:21:24 CST 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to