Zhong Yanghong created KYLIN-3491:

             Summary: Improve the cube building process when using global 
                 Key: KYLIN-3491
                 URL: https://issues.apache.org/jira/browse/KYLIN-3491
             Project: Kylin
          Issue Type: Improvement
            Reporter: Zhong Yanghong
            Assignee: Zhong Yanghong

By current cubing process, if the global dictionary is very large, since the 
raw data records are unsorted, it's hard to encode raw values into ids for the 
input of bitmap due to frequent swap of the dictionary slices. We need a 
refined process. The idea is as follows:
 # for each source data block, there will be a mapper generating the distinct 
values & sort them
 # encode the sorted distinct values and generate a shrunken dict for each 
source data block.
 # when building base cuboid, use the shrunken dict for each source data block 
for encoding.

This message was sent by Atlassian JIRA

Reply via email to