Hi, I've been experimenting with Kylin for some time, and I ran into a difficult problem:
I have a cube (total size ~150GB, ~1.1B source records) with the following dimensions and cardinalities (as they defined in the aggregation group): date 10 dim0 250 STRING dim1 60 STRING dim2 3000 INT dim3 7000 INT dim4 30 INT dim5 20 INT dim6 30 INT dim7 10 INT When I execute queries like this (accept partial = false): SELECT dim1, SUM(m0), SUM(m1), … FROM fact WHERE date BETWEEN … AND dim0 IN (10 values) AND dim2 IN (10 values) GROUP BY dim1 LIMIT 10; SELECT dim7, SUM(m0), SUM(m1), … FROM fact WHERE date BETWEEN … AND dim0 IN (10 values) AND dim2 IN (10 values) AND dim3 IN (10 values) GROUP BY dim7 LIMIT 10; SELECT dim7, SUM(m0), SUM(m1), … FROM fact WHERE date BETWEEN … AND dim0 IN (10 values) AND dim2 IN (10 values) AND dim3 IN (10 values) AND dim4 IN (10 values) AND dim6 IN (10 values) GROUP BY dim7 LIMIT 10; Coprocessors consume 100% CPU on some of the region servers and never finish. I tried to profile a region server and got the following: http://i.imgur.com/yrKnDc1.png I tried to disable fuzzy key feature using backdoorToggles, and got much better results: coprocessors don't get stuck anymore and I always get response. Though response time suffered a bit but overall responsiveness is much better. Query times I get for the queries (accept partial = false): 1. 5-10 seconds 2. 30-100 seconds 3. 180-300 seconds So my questions are: 1. Are there ways to improve query time for this kind of queries? 2. Why coprocessors consume 100% cpu and never finish with enabled fuzzy key? Thanks.
