[ 
https://issues.apache.org/jira/browse/KYLIN-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17318709#comment-17318709
 ] 

zhimin wu commented on KYLIN-4969:
----------------------------------

*Root Cause:*

For a filter condition in a query, there are three places that it can take 
effect during a query,
Step 1. Coarse-grained segment Pruner and Shard Pruner. Filtering the data 
before reading it.
Step 2. Filter Pushdown performed during the reading of data. However, in the 
process of reading data, the data in the original data containing only normal 
dimension does not contain the data of Derived dimension, so for the filter on 
Derived dimension, The filter needs to be converted to a filter on the Normal 
dimension using the snapshot of the corresponding dimension table. The 
coarse-grained filter is pushed down when the exact conversion is not possible.
Step 3. After reading the data, the query plan of Calcite will be walked over, 
and the precise filtering conditions will be passed in the query plan.

The `stream aggregate` is executed between steps 2 and 3, which is the root 
cause of the incorrect query results. refer to 
https://issues.apache.org/jira/browse/KYLIN-2501
If accurate filtering is not realized in Step 2, it is necessary to ensure that 
the Stream Aggregate conducted in advance this time will add the columns 
related to the filtering conditions that failed to achieve accurate filtering 
to the Aggregate group, so as to ensure that the data is still accurate when 
Step3 is carried out.

In the appeal SQL, The filter condition SALES_REGION. R_NAME = BUY_REGION. 
R_NAME is a filter that cannot be transformed without the actual column data 
but here it is incorrectly converted to the coarse-grained filter. Moreover, 
the system mistakenly believed that the converted filter was an accurate 
filter, and the corresponding column was not added to the Aggregate group of 
the subsequent Stream Aggregate, resulting in incorrect data when entering 
Step3 and incorrect query results.

> Query results may be incorrect if the query filter condition contains derived 
> dimensions
> ----------------------------------------------------------------------------------------
>
>                 Key: KYLIN-4969
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4969
>             Project: Kylin
>          Issue Type: Bug
>          Components: Query Engine
>    Affects Versions: v3.1.1
>            Reporter: zhimin wu
>            Priority: Major
>         Attachments: image-2021-04-11-16-11-03-058.png, 
> image-2021-04-11-16-14-10-698.png, image-2021-04-11-16-15-52-352.png, 
> image-2021-04-11-16-17-45-890.png, image-2021-04-11-16-21-53-243.png
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> There are there tables.
> !image-2021-04-11-16-11-03-058.png|width=431,height=280!
> Model as follows
> !image-2021-04-11-16-14-10-698.png|width=824,height=328!
> Cube as follows
> !image-2021-04-11-16-15-52-352.png|width=765,height=425!
> the query result on kylin
> !image-2021-04-11-16-17-45-890.png|width=762,height=450!
> resule on hive
> !image-2021-04-11-16-21-53-243.png|width=1207,height=913!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to