Hi,

 

Recently we were trying queries like this: select id, count(distinct no)
from the_table where id in (x1, x2, x3, .) group by id, where x1, x2, x3, .
are the actual id values.

And we found out that the performance of kylin query would drop
significantly if the values in where clause can't be translated into
dictionary id.

 

What I mean is that if let's say the cube doesn't contain id values of x2,
when running the above query, the total scan count will be much larger than
the scan count if the cube contains all the x values.

 

For example, we had a query of 39 x values in where clause and there is one
x value not in cube,  which yielded the following result:

Duration: 60.947


Cube Names: [olap]

Total scan count: 2524898

Result row count: 39

(The log shows "Can't translate value xxx to dictionary ID, roundingFlag 0.
Using default value \xFF")

 

And we excluded the x value that is not in cube and re-run the query and got
the another result:

Duration: 2.477

Cube Names: [olap]

Total scan count: 96543

Result row count: 38

 

The second query runs much faster just because it removes an id value that
is not in cube. Can anyone share some ideas about this? Would not building
dictionary and just using raw values for id column to build cube be a
solution to improve the performance?

 

Best Regards

Hua

Reply via email to