Hi All: We are encounting some problems while supporting a demand that a cube with some high cardinality dimensions, those dimensions are URLs and user want to use those dimensions in where clause and filter with like function. besides, the cube has one distinct count measure.
We has such problems : 1、for one URL dimension, cardinality is about 50W one day, and the size of fact_distinct_columns file is about 500M+, so when we build the cube with more day, the job will failed in 'Build Dimension Dictionary' step(one dimension file is about 3GB) 2、after building segment of one day, we find like filter is so slow to convert to in filter, and the filter is so big that buffer will out of bounds. 3、while executing sql with count(distinct col), the coprocossor will be disable(why ?), and scanner will return more tuple so that exceed the context threadhold and query will fail. Does anyone excounter such problem and how to solve such problems in the sence that creating a cube with high cardinality dimensions such as URLs. Any suggestions are welcome, Thanks a lot.