Hi,

I've been experimenting with Kylin for some time, and I ran into a difficult 
problem:

I have a cube (total size ~150GB, ~1.1B source records) with the following 
dimensions and cardinalities (as they defined in the aggregation group):
date 10
dim0 250 STRING
dim1 60 STRING
dim2 3000 INT
dim3 7000 INT
dim4 30 INT
dim5 20 INT
dim6 30 INT
dim7 10 INT

When I execute queries like this (accept partial = false):

SELECT dim1, SUM(m0), SUM(m1), … FROM fact
WHERE date BETWEEN … AND
dim0 IN (10 values) AND
dim2 IN (10 values)
GROUP BY dim1 LIMIT 10;

SELECT dim7, SUM(m0), SUM(m1), … FROM fact
WHERE date BETWEEN … AND
dim0 IN (10 values) AND
dim2 IN (10 values) AND
dim3 IN (10 values)
GROUP BY dim7 LIMIT 10;

SELECT dim7, SUM(m0), SUM(m1), … FROM fact
WHERE date BETWEEN … AND
dim0 IN (10 values) AND
dim2 IN (10 values) AND
dim3 IN (10 values) AND
dim4 IN (10 values) AND
dim6 IN (10 values)
GROUP BY dim7 LIMIT 10;


Coprocessors consume 100% CPU on some of the region servers and never finish.
I tried to profile a region server and got the following:
http://i.imgur.com/yrKnDc1.png

I tried to disable fuzzy key feature using backdoorToggles, and got much better 
results: coprocessors don't get stuck anymore and I always get response. Though 
response time suffered a bit but overall responsiveness is much better.

Query times I get for the queries (accept partial = false):
1. 5-10 seconds
2. 30-100 seconds
3. 180-300 seconds

So my questions are:
1. Are there ways to improve query time for this kind of queries?
2. Why coprocessors consume 100% cpu and never finish with enabled fuzzy key?

Thanks.

Reply via email to