by "a large number of candidates" do you mean the dimension has very high cardinality? I assume so.
The way to optimize high cardinality dimension cube depends on your query pattern: When queries involve filters on the high cardinality dimension, it's best to put the dimension at the beginning of row key, so that filters can help quickly filter unwanted cube rows. If your query is filtering on other dimension and group by the high cardinality dimension, the query can easily return massive amount of results. The scenario used to be weakness of Kylin. However, recently we're working on multiple improvements on similar scenarios. https://issues.apache.org/jira/browse/KYLIN-1428. After these improvements released I'll summarise a blog to explain all of them. You can also check http://apache-kylin.74782.x6.nabble.com/How-to-use-kylin-with-high-cardinality-dimensions-td3661.html, it might be inspiring to you. On Thu, Apr 21, 2016 at 11:02 AM, huawang <[email protected]> wrote: > hi, I found that if a dimension of the cube has a large number of > candidates, the query will be very slow. Is there any solution to this > condition? -- Regards, *Bin Mahone | 马洪宾* Apache Kylin: http://kylin.io Github: https://github.com/binmahone
