Hi,
Recently we were trying queries like this: select id, count(distinct no) from the_table where id in (x1, x2, x3, .) group by id, where x1, x2, x3, . are the actual id values. And we found out that the performance of kylin query would drop significantly if the values in where clause can't be translated into dictionary id. What I mean is that if let's say the cube doesn't contain id values of x2, when running the above query, the total scan count will be much larger than the scan count if the cube contains all the x values. For example, we had a query of 39 x values in where clause and there is one x value not in cube, which yielded the following result: Duration: 60.947 Cube Names: [olap] Total scan count: 2524898 Result row count: 39 (The log shows "Can't translate value xxx to dictionary ID, roundingFlag 0. Using default value \xFF") And we excluded the x value that is not in cube and re-run the query and got the another result: Duration: 2.477 Cube Names: [olap] Total scan count: 96543 Result row count: 38 The second query runs much faster just because it removes an id value that is not in cube. Can anyone share some ideas about this? Would not building dictionary and just using raw values for id column to build cube be a solution to improve the performance? Best Regards Hua
