[
https://issues.apache.org/jira/browse/KYLIN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16640331#comment-16640331
]
Ruslan Dautkhanov commented on KYLIN-3491:
------------------------------------------
Is this optimization for cube build time only, or it will help with query
performance too?
Would you recommend bitmapCounter for highly-cardinal columns?
I assume it will work super fast for low-cardinal columns like `product type`,
but would
it work on highly cardinal columns, let's say if number of distinct values in a
column
`household_id` is 1 billion, would Bitmap Counter and Kylin general handle
`count(distinct household_id)` very well?
> Improve the cube building process when using global dictionary
> --------------------------------------------------------------
>
> Key: KYLIN-3491
> URL: https://issues.apache.org/jira/browse/KYLIN-3491
> Project: Kylin
> Issue Type: Improvement
> Components: Job Engine
> Reporter: Zhong Yanghong
> Assignee: Zhong Yanghong
> Priority: Major
> Fix For: v2.5.0
>
> Attachments: APACHE-KYLIN-3491-with-fix.patch, APACHE-KYLIN-3491.patch
>
>
> By current cubing process, if the global dictionary is very large, since the
> raw data records are unsorted, it's hard to encode raw values into ids for
> the input of bitmap due to frequent swap of the dictionary slices. We need a
> refined process. The idea is as follows:
> # for each source data block, there will be a mapper generating the distinct
> values & sort them
> # encode the sorted distinct values and generate a shrunken dict for each
> source data block.
> # when building base cuboid, use the shrunken dict for each source data
> block for encoding.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)