[
https://issues.apache.org/jira/browse/KYLIN-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17318709#comment-17318709
]
zhimin wu commented on KYLIN-4969:
----------------------------------
*Root Cause:*
For a filter condition in a query, there are three places that it can take
effect during a query,
Step 1. Coarse-grained segment Pruner and Shard Pruner. Filtering the data
before reading it.
Step 2. Filter Pushdown performed during the reading of data. However, in the
process of reading data, the data in the original data containing only normal
dimension does not contain the data of Derived dimension, so for the filter on
Derived dimension, The filter needs to be converted to a filter on the Normal
dimension using the snapshot of the corresponding dimension table. The
coarse-grained filter is pushed down when the exact conversion is not possible.
Step 3. After reading the data, the query plan of Calcite will be walked over,
and the precise filtering conditions will be passed in the query plan.
The `stream aggregate` is executed between steps 2 and 3, which is the root
cause of the incorrect query results. refer to
https://issues.apache.org/jira/browse/KYLIN-2501
If accurate filtering is not realized in Step 2, it is necessary to ensure that
the Stream Aggregate conducted in advance this time will add the columns
related to the filtering conditions that failed to achieve accurate filtering
to the Aggregate group, so as to ensure that the data is still accurate when
Step3 is carried out.
In the appeal SQL, The filter condition SALES_REGION. R_NAME = BUY_REGION.
R_NAME is a filter that cannot be transformed without the actual column data
but here it is incorrectly converted to the coarse-grained filter. Moreover,
the system mistakenly believed that the converted filter was an accurate
filter, and the corresponding column was not added to the Aggregate group of
the subsequent Stream Aggregate, resulting in incorrect data when entering
Step3 and incorrect query results.
> Query results may be incorrect if the query filter condition contains derived
> dimensions
> ----------------------------------------------------------------------------------------
>
> Key: KYLIN-4969
> URL: https://issues.apache.org/jira/browse/KYLIN-4969
> Project: Kylin
> Issue Type: Bug
> Components: Query Engine
> Affects Versions: v3.1.1
> Reporter: zhimin wu
> Priority: Major
> Attachments: image-2021-04-11-16-11-03-058.png,
> image-2021-04-11-16-14-10-698.png, image-2021-04-11-16-15-52-352.png,
> image-2021-04-11-16-17-45-890.png, image-2021-04-11-16-21-53-243.png
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> There are there tables.
> !image-2021-04-11-16-11-03-058.png|width=431,height=280!
> Model as follows
> !image-2021-04-11-16-14-10-698.png|width=824,height=328!
> Cube as follows
> !image-2021-04-11-16-15-52-352.png|width=765,height=425!
> the query result on kylin
> !image-2021-04-11-16-17-45-890.png|width=762,height=450!
> resule on hive
> !image-2021-04-11-16-21-53-243.png|width=1207,height=913!
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)