[jira] [Commented] (KYLIN-1869) When building snapshot for lookup tables, should we build those dimensions used by model or the whole

Zhong Yanghong (JIRA) Sun, 17 Jul 2016 21:54:05 -0700

    [ 
https://issues.apache.org/jira/browse/KYLIN-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15381713#comment-15381713
 ]


Zhong Yanghong commented on KYLIN-1869:
---------------------------------------

In some cases, the lookup table is very big. For example, a table for seller 
information. The cardinality of sell_id is more than 10 million. If only 
including seller name as a derived column (64bytes). The size of the lookup 
table will be around 720MB. Currently we put the seller name together with 
seller id into the fact table. In 1.5.2, we can use joint for these two 
columns. Is there any better solution to deal with this kind of high 
cardinality lookup columns?

> When building snapshot for lookup tables, should we build those dimensions 
> used by model or the whole
> -----------------------------------------------------------------------------------------------------
>
>                 Key: KYLIN-1869
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1869
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Job Engine
>            Reporter: Zhong Yanghong
>            Assignee: Zhong Yanghong
>
> Currently when building a snapshot for a lookup table, the input is the whole 
> value set of the lookup table, which may be not so reasonable. In some cases, 
> a lookup table owns tens columns. However, the columns used by a model or a 
> cube is only a few, 1 to 5. Those unused columns will make the snapshot too 
> large, which will bring burdens for both storing and loading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KYLIN-1869) When building snapshot for lookup tables, should we build those dimensions used by model or the whole

Reply via email to