Qianhao, have assigned you to the JIRA. Please attempt a fix. :-) On Sat, May 2, 2015 at 12:32 AM, 周千昊 <[email protected]> wrote:
> Hi, Huang > Actually using dictionary will not affect the scan range, because the > dictionary we are using will preserve the order. And one of the most > important reason to use dictionary is storage saving. > However for this case, some optimisation can be made, for example, > when the value in IN clause can not be translated into dictionary id, we > can simply ignore this value. > A jira ticket has been created: > https://issues.apache.org/jira/browse/KYLIN-747 > > Huang Hua <[email protected]>于2015年4月30日周四 下午6:15写道: > > > Hi, > > > > > > > > Recently we were trying queries like this: select id, count(distinct no) > > from the_table where id in (x1, x2, x3, .) group by id, where x1, x2, > x3, . > > are the actual id values. > > > > And we found out that the performance of kylin query would drop > > significantly if the values in where clause can't be translated into > > dictionary id. > > > > > > > > What I mean is that if let's say the cube doesn't contain id values of > x2, > > when running the above query, the total scan count will be much larger > than > > the scan count if the cube contains all the x values. > > > > > > > > For example, we had a query of 39 x values in where clause and there is > one > > x value not in cube, which yielded the following result: > > > > Duration: 60.947 > > > > > > Cube Names: [olap] > > > > Total scan count: 2524898 > > > > Result row count: 39 > > > > (The log shows "Can't translate value xxx to dictionary ID, roundingFlag > 0. > > Using default value \xFF") > > > > > > > > And we excluded the x value that is not in cube and re-run the query and > > got > > the another result: > > > > Duration: 2.477 > > > > Cube Names: [olap] > > > > Total scan count: 96543 > > > > Result row count: 38 > > > > > > > > The second query runs much faster just because it removes an id value > that > > is not in cube. Can anyone share some ideas about this? Would not > building > > dictionary and just using raw values for id column to build cube be a > > solution to improve the performance? > > > > > > > > Best Regards > > > > Hua > > > > >
