[
https://issues.apache.org/jira/browse/KYLIN-187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Roger Shi updated KYLIN-187:
----------------------------
Summary: Data Statistics Collection and Auto Modeling (was: Data
Statistics Analyzer )
> Data Statistics Collection and Auto Modeling
> ---------------------------------------------
>
> Key: KYLIN-187
> URL: https://issues.apache.org/jira/browse/KYLIN-187
> Project: Kylin
> Issue Type: New Feature
> Components: Tools, Build and Test
> Reporter: Luke Han
> Labels: github-import
> Fix For: Backlog
>
>
> 1 Overview
> We need the statistics data for the following domains:
> * Design cube metadata based on query log
> * Design HBase row-key based on data distribution (e.g. histogram and
> cardinality)
> * Choose execution plan based on cuboid data
> 2 Data Analyzer
> We need to analyzer the hive data and cube data in 2 phases. Firstly, we will
> analyze the hive to guide the 1st round design of row key. Then we will
> analyze the cube data to refine the design of row key and to estimate the
> cost of query.
> 2.1 Analyze Hive Data
> We need to analyze the following statistics data on hive table:
> * Cardinality of each dimension
> * Cardinality of dimension combination (optional)
> * Value distribution of each dimension (optional)
> Based on the statistics of hive data, we can design row key group from high
> cardinality dimension to low cardinality dimension. BTW, we should evenly
> split dimension into the row key group that will reduce the number of cuboid.
> 2.2 Analyze Cube Data
> We need to analyze the following statistics on data cube:
> * Count of each cuboid
> * Group ratio of each cuboid = current cuboid count / lower group base cuboid
> count
> 3 Query Analyzer
> TBD
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/KylinOLAP/Kylin/issues/318
> Created by: [lukehan|https://github.com/lukehan]
> Labels: newfeature,
> Milestone: Backlog
> Created at: Fri Dec 26 15:21:24 CST 2014
> State: open
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)