[ 
https://issues.apache.org/jira/browse/KYLIN-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16656425#comment-16656425
 ] 

WangBo commented on KYLIN-3634:
-------------------------------

Hi shaofeng,thank you for reviewing.

Q:“Do you mean that, "NULL" should not be matched as  " <> 'abc'" ?”

A:yes;hive has the same logic,“select 'abc' != null” returns null,this means 
the row which contains null will be excluded.

 

Q:why this may cause the result is 0?

A:First, I will show this by an example.Using the table A in above example.

day            city         price

20180101 null          10

20180101 beijing      20
20180101 shanghai 10

==aggregate by datekey in hbase coprocessor==>

day            city  price

20180101 null    40      

==calcite  dynamical code in kylin server==> 

if(city != null and city.equals("abc")){

   current_tuple = .....  

     return true

}

return false;

---------------------------------------------------------------------------------------------

see that the three rows share the same aggregation key(day,city), the process 
of generating aggregation key is random,this means the key may contain null 
value.And calcite will filter the row which city column has null value.

the key point here is,three rows actually generate different aggregation 
key(day,city) value,but the comparator  of aggBufMap(used for aggragating,see 
code logic in GTAggregateScanner.java)just compare the key which is 'day'(this 
happens when not all the dimensions used in group by)

>From the perspective of aggBufMap,value of aggregation key of three rows are 
>the same.

So if the row(2018,null,10) was selected to generate the aggretation,then bad 
case happens

 

Second:the case of returning 0 rows is hardly to recurrent.it depends the data 
distribution in hbase.I build the same cube in dev production,but didn't get 
the bad case.

> When filter column has null value may cause incorrect query result
> ------------------------------------------------------------------
>
>                 Key: KYLIN-3634
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3634
>             Project: Kylin
>          Issue Type: Bug
>          Components: Query Engine
>    Affects Versions: v2.0.0
>            Reporter: WangBo
>            Assignee: WangBo
>            Priority: Major
>         Attachments: 
> 0001-KYLIN-3634-when-filter-column-has-null-value-may-cau.patch
>
>
> h1. Question
> when a column has null value,and using it as a filter column when querying, 
> and the filter value is not exist in the table,this may cause incorrect result
> h1. An Example
> h2. Table A
> the table A has three rows,city column of one row has null value
>  
> ||day||...||city||price||
> |20180101| |null|10|
> |20180101| |beijing|20|
> |20180101| |shanghai|10|
> h2. Query SQL
> select day,sum(price) from a where city <> 'abc' group by day
> h2. Correct Result
> exclude the row contains null city value
> ||day||col||
> |20180101|30|
> h2. InCorrect Result
> resullt 0 rows
> this happens in our production environment,the kylin version is 2.0.0
> h1. Analysis process
> 1,city column dosen't have a value,so the CompareTupleFilter will turn into  
> ConstantTupleFilter(see GTUtil.java)
> 2,if dimensions in the sql dosen't match all the columns using in group 
> by,the  bytesComparator used in hbase aggregation map will only compare the 
> columns using in group by
> 3,when GTAggregateScanner constructs key of aggBufMap,the key may contains 
> null value,because the comparator of aggBufMap only compares group by 
> columns,so the tuple share same group by columns may also share the same keys 
> which contains null value;This may cause kylin server receives tuples 
> contains null value;
> 4,when the code which dynamically generated by calcilte deals tuples using 
> filter,it first judges whether the column is null.Because filter column in 
> the tuple contains null value,so it always return false, no tuples will 
> return.
> h1. Solution
> when the filter column value is a invalid means not in the table,turn the 
> CompareTupleFiter into IS_NOT_NULL filter,instead of ConstantTupleFilter.TURE
>  
> Now I have test the feature in our production environment ;
> test in “mvn test” had passed,but not test in sandbox
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to