[ 
https://issues.apache.org/jira/browse/KYLIN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16575986#comment-16575986
 ] 

Zhong Yanghong commented on KYLIN-3491:
---------------------------------------

For a dimension with cardinality around 90M, the comparison of encoding 
performance is as follows:
 * Directly using global dictionary, 165min
 * Using two steps with shrunken dictionary, 35+11=46min

> Improve the cube building process when using global dictionary
> --------------------------------------------------------------
>
>                 Key: KYLIN-3491
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3491
>             Project: Kylin
>          Issue Type: Improvement
>            Reporter: Zhong Yanghong
>            Assignee: Zhong Yanghong
>            Priority: Major
>
> By current cubing process, if the global dictionary is very large, since the 
> raw data records are unsorted, it's hard to encode raw values into ids for 
> the input of bitmap due to frequent swap of the dictionary slices. We need a 
> refined process. The idea is as follows:
>  # for each source data block, there will be a mapper generating the distinct 
> values & sort them
>  # encode the sorted distinct values and generate a shrunken dict for each 
> source data block.
>  # when building base cuboid, use the shrunken dict for each source data 
> block for encoding.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to