Re: KYLIN-747 bad query performance when IN clause contains a value doesn't exist in the dictionary

Vadim Semenov Wed, 28 Oct 2015 19:53:56 -0700

Thank you!

I just tested the changes with queries like:
SELECT dt, SUM(metric) FROM table WHERE dt IN ('2015-10-01', '2015-10-02') 
GROUP BY dt;
SELECT dt, SUM(metric) FROM table WHERE dt BETWEEN '2015-10-01' AND 
'2015-10-07' GROUP BY dt;

before every partition was scanned, after the changes only relevant partitions 
are scanned.

Very useful performance change, thanks again.

On October 28, 2015 at 4:02:19 AM, Li Yang ([email protected]) wrote:

Em.. I searched commits on KYLIN-747 and get nothing too. Thought it was 
covered by fix to some other JIRA. 

Anyway, I cooked some test cases to query non-existing values, and made 
some further optimization. Now the ever-false scan range can be pruned. 

https://github.com/apache/incubator-kylin/commit/f96600e89a5a2c0e533cea86ad6a73b9451bcddc

On Tue, Oct 27, 2015 at 12:03 PM, Vadim Semenov <[email protected]> wrote: 

> Hi everyone, 
> 
> I'm trying to find the commit where this issue was fixed ( 
> https://issues.apache.org/jira/browse/KYLIN-747). 
> Could you point me? 
> 
> We have a cube that is partitioned by date, and when we query using SELECT 
> * FROM table WHERE dt = '2015-10-01', I see in the logs: 
> "Can't translate value 2015-10-01 to dictionary ID, roundingFlag 0. Using 
> default value \xFF", 
> which translates into a huge scan.

Re: KYLIN-747 bad query performance when IN clause contains a value doesn't exist in the dictionary

Reply via email to