Hi, Huang
      Actually using dictionary will not affect the scan range, because the
dictionary we are using will preserve the order. And one of the most
important reason to use dictionary is storage saving.
      However for this case, some optimisation can be made, for example,
when the value in IN clause can not be translated into dictionary id, we
can simply ignore this value.
      A jira ticket has been created:
https://issues.apache.org/jira/browse/KYLIN-747

Huang Hua <[email protected]>于2015年4月30日周四 下午6:15写道:

> Hi,
>
>
>
> Recently we were trying queries like this: select id, count(distinct no)
> from the_table where id in (x1, x2, x3, .) group by id, where x1, x2, x3, .
> are the actual id values.
>
> And we found out that the performance of kylin query would drop
> significantly if the values in where clause can't be translated into
> dictionary id.
>
>
>
> What I mean is that if let's say the cube doesn't contain id values of x2,
> when running the above query, the total scan count will be much larger than
> the scan count if the cube contains all the x values.
>
>
>
> For example, we had a query of 39 x values in where clause and there is one
> x value not in cube,  which yielded the following result:
>
> Duration: 60.947
>
>
> Cube Names: [olap]
>
> Total scan count: 2524898
>
> Result row count: 39
>
> (The log shows "Can't translate value xxx to dictionary ID, roundingFlag 0.
> Using default value \xFF")
>
>
>
> And we excluded the x value that is not in cube and re-run the query and
> got
> the another result:
>
> Duration: 2.477
>
> Cube Names: [olap]
>
> Total scan count: 96543
>
> Result row count: 38
>
>
>
> The second query runs much faster just because it removes an id value that
> is not in cube. Can anyone share some ideas about this? Would not building
> dictionary and just using raw values for id column to build cube be a
> solution to improve the performance?
>
>
>
> Best Regards
>
> Hua
>
>

Reply via email to