[
https://issues.apache.org/jira/browse/KYLIN-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184748#comment-15184748
]
Yerui Sun commented on KYLIN-1379:
----------------------------------
Discussed with [[email protected]], some key points about next version:
* The RoaringBitmap serialized bytes size is 2*N(cardinality) approximately,
due to memory issue, we'll limit the precision(cardinality) of bitmap measure,
up to 10M(10,000,000). The column which over the cardinality **will cause cube
building failed**.
* Measure bitmap will encode the column values with dict, to support all data
type, including Long, String, Date and etc.
I'll work on this later and post patch for 2.x-staging branch.
> More stable precise count distinct implements after KYLIN-1186
> --------------------------------------------------------------
>
> Key: KYLIN-1379
> URL: https://issues.apache.org/jira/browse/KYLIN-1379
> Project: Kylin
> Issue Type: Improvement
> Components: Job Engine
> Affects Versions: v2.1, v1.3
> Reporter: Yerui Sun
> Assignee: Yerui Sun
>
> After KYLIN-1186, we've gained the ability to count distinct Int type columns
> precisely.
> However, the implements of KYLIN-1186 is not stable, especially in
> 2.x-staging branch.
> The reason is that the measure's maxlength is used to allocate memory in 2.x
> version, and the BitmapMeasure is hardcoded to 8MB in KYLIN-1186, causing OOM
> when cube building.
> To resolve this problem, we have introduce precision on the bitmap measure,
> such as bitmap(100), bitmap(10000), bitmap(1000000), meaning the measure
> could accept 100/10000/1M cardinality at most. This solution should be fine,
> considering the reality, if the count value over 1000000, the hyperloglog
> measure which produce approx. result should be acceptable.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)