Firstly, for Ultra Hight Cardinality dimension, dictionary encoding does
not fit. You need change to "integer" or "fixed_length" encoding method.
For "userid" if it is a integer/long number, "integer" is best matched. The
reason is dictionary need load all values into memory, that will fill up
Java heap when the cardinality is high.

Besides, if you have multiple UHC dimension in one cube, you'd better
customize the aggregation group to avoid them mutual grouped.


在 2017年7月6日 上午10:26,[email protected] <[email protected]>写道:

> hi,我们这边遇到了高基数维度构建的问题,比如说userid这种可能会超过1亿的数据量。构建的时候很容易失败,
> 求问kylin有这方面的构建成功的案例吗?
> 我们的集群比较大,但是kylin申请的资源不太合理,很容易出现数据倾斜的情况,一个reduce失败。
>
>
>
> [email protected]
>



-- 
Best regards,

Shaofeng Shi 史少锋

Reply via email to