[
https://issues.apache.org/jira/browse/KYLIN-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16275628#comment-16275628
]
Yifei Wu commented on KYLIN-3078:
---------------------------------
the key is to clarify the percentile impact on cube size estimate and find a
more proper way to estimate the size of percentile measure.
For the measure use the T-digest Algorithm to realize it, so it can conclude
some regular pattern by the analysis from the T-digest paper and the statistics
collected in the local test.
> the estimated size of percentile measure is too big
> ----------------------------------------------------
>
> Key: KYLIN-3078
> URL: https://issues.apache.org/jira/browse/KYLIN-3078
> Project: Kylin
> Issue Type: Bug
> Reporter: Yifei Wu
> Assignee: Yifei Wu
> Priority: Critical
>
> To set a shard number that will be for controlling the size per shard
> properly, we need to estimate cube size through accumulating all dimension
> and measure size roughly before building a cube. But the way of calculating
> the percentile measure is inaccurate currently and cause too many partitions
> for cube storage. Furthermore, it may affect the performance of SQL query.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)