[jira] [Commented] (KYLIN-3491) Improve the cube building process when using global dictionary

Shaofeng SHI (JIRA) Thu, 23 Aug 2018 23:51:39 -0700


    [ 
https://issues.apache.org/jira/browse/KYLIN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591224#comment-16591224
 ]


Shaofeng SHI commented on KYLIN-3491:
-------------------------------------

This error was happened in building an empty segment 
(20220101000000_20230101000000); On an empty segment, there is no dict info for 
the global dict.

 

For example, in the previous segment, there is the global dict reference:
{code:java}
"BUYER_ACCOUNT.ACCOUNT_COUNTRY" : 
"/dict/DEFAULT.TEST_ACCOUNT/ACCOUNT_COUNTRY/7f7ba5c1-dedc-7958-e362-60554b977ace.dict",
"TEST_KYLIN_FACT.TEST_COUNT_DISTINCT_BITMAP" : 
"/dict/DEFAULT.TEST_KYLIN_FACT/TEST_COUNT_DISTINCT_BITMAP/bc85a6c5-2054-4240-acf3-3f6174e390f0.dict"
{code}
But in the empty segment, it has no this:

 
{code:java}
"BUYER_ACCOUNT.ACCOUNT_COUNTRY" : 
"/dict/DEFAULT.TEST_ACCOUNT/ACCOUNT_COUNTRY/f591edbf-010e-245b-725c-17a4e90ead2e.dict"
{code}
 

> Improve the cube building process when using global dictionary
> --------------------------------------------------------------
>
>                 Key: KYLIN-3491
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3491
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Job Engine
>            Reporter: Zhong Yanghong
>            Assignee: Zhong Yanghong
>            Priority: Major
>             Fix For: v2.5.0
>
>         Attachments: APACHE-KYLIN-3491.patch
>
>
> By current cubing process, if the global dictionary is very large, since the 
> raw data records are unsorted, it's hard to encode raw values into ids for 
> the input of bitmap due to frequent swap of the dictionary slices. We need a 
> refined process. The idea is as follows:
>  # for each source data block, there will be a mapper generating the distinct 
> values & sort them
>  # encode the sorted distinct values and generate a shrunken dict for each 
> source data block.
>  # when building base cuboid, use the shrunken dict for each source data 
> block for encoding.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (KYLIN-3491) Improve the cube building process when using global dictionary

Reply via email to