How to use kylin with high cardinality dimensions.

yu feng Wed, 17 Feb 2016 19:04:09 -0800

Hi All:
    We are encounting some problems while supporting a demand that a cube
with some high cardinality dimensions, those dimensions are URLs and user
want to use those dimensions in where clause and filter with like function.
besides, the cube has one distinct count measure.


We has such problems :
1、for one URL dimension, cardinality is about 50W one day, and the size
of fact_distinct_columns file is about 500M+, so when we build the cube
with more day, the job will failed in 'Build Dimension Dictionary' step(one
dimension file is about 3GB)

2、after building segment of one day, we find like filter is so slow
to convert to in filter, and the filter is so big that buffer will out of
bounds.

3、while executing sql with count(distinct col), the coprocossor will be
disable(why ?), and scanner will return more tuple so that exceed the
context threadhold and query will fail.

Does anyone excounter such problem and how to solve such problems in the
sence that creating a cube with high cardinality dimensions such as URLs.

Any suggestions are welcome, Thanks a lot.

How to use kylin with high cardinality dimensions.

Reply via email to