Qianhao, have assigned you to the JIRA.  Please attempt a fix.  :-)

On Sat, May 2, 2015 at 12:32 AM, 周千昊 <[email protected]> wrote:

> Hi, Huang
>       Actually using dictionary will not affect the scan range, because the
> dictionary we are using will preserve the order. And one of the most
> important reason to use dictionary is storage saving.
>       However for this case, some optimisation can be made, for example,
> when the value in IN clause can not be translated into dictionary id, we
> can simply ignore this value.
>       A jira ticket has been created:
> https://issues.apache.org/jira/browse/KYLIN-747
>
> Huang Hua <[email protected]>于2015年4月30日周四 下午6:15写道:
>
> > Hi,
> >
> >
> >
> > Recently we were trying queries like this: select id, count(distinct no)
> > from the_table where id in (x1, x2, x3, .) group by id, where x1, x2,
> x3, .
> > are the actual id values.
> >
> > And we found out that the performance of kylin query would drop
> > significantly if the values in where clause can't be translated into
> > dictionary id.
> >
> >
> >
> > What I mean is that if let's say the cube doesn't contain id values of
> x2,
> > when running the above query, the total scan count will be much larger
> than
> > the scan count if the cube contains all the x values.
> >
> >
> >
> > For example, we had a query of 39 x values in where clause and there is
> one
> > x value not in cube,  which yielded the following result:
> >
> > Duration: 60.947
> >
> >
> > Cube Names: [olap]
> >
> > Total scan count: 2524898
> >
> > Result row count: 39
> >
> > (The log shows "Can't translate value xxx to dictionary ID, roundingFlag
> 0.
> > Using default value \xFF")
> >
> >
> >
> > And we excluded the x value that is not in cube and re-run the query and
> > got
> > the another result:
> >
> > Duration: 2.477
> >
> > Cube Names: [olap]
> >
> > Total scan count: 96543
> >
> > Result row count: 38
> >
> >
> >
> > The second query runs much faster just because it removes an id value
> that
> > is not in cube. Can anyone share some ideas about this? Would not
> building
> > dictionary and just using raw values for id column to build cube be a
> > solution to improve the performance?
> >
> >
> >
> > Best Regards
> >
> > Hua
> >
> >
>

Reply via email to