[jira] [Commented] (KYLIN-5740) When the query includes multiple aggregate functions, and the measure corresponding to the aggregate functions is not in the aggregate index, the query may report an error

zhong.zhu (Jira) Sun, 10 Dec 2023 22:40:04 -0800


    [ 
https://issues.apache.org/jira/browse/KYLIN-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17795184#comment-17795184
 ]


zhong.zhu commented on KYLIN-5740:
----------------------------------

h1. Dev Design

Postponing the de-duplication logic and downgrading it to 
applyDimAsMeasureStrategy ensures that if the model doesn't have a metric that 
matches the query, every Agg after parsing has a chance of being marked as 
DimAsMeasure so that it can hit the indexes

 
h1. Root Cause

!image-2023-12-11-14-32-11-954.png!

The set in the figure above will de-duplicate {*}_sqlDigest.aggregations_{*}, 
and the logic for determining whether the aggregations are the same is that as 
long as the values of the attributes are the same, they can be determined to be 
the same.

After a series of translations of the sql given in the description, two Aggs 
will be generated, and the express and parameter of these two Aggs are the 
same, calling equal true.

So after de-duplication, only one of the Aggs will be retained.

Next, the unmatchedAggregations method is used to match the Measure and AggFunc 
on the Layout.

 

{*}1 . If there is a Measure on the current Layout (e.g., an aggregation group 
that contains the MAX(BIG_REGION_NAME metric) that matches this Aggregate, the 
Aggregate can be used as the Aggregate.{*}{*}{*}

The removed AggFunc will not be marked as {_}DimAsMeasure{_}. those Aggs that 
are not marked as DimAsMeasure will be rewritten as agg for cube metrics 
columns in the OLAPAggregateRel# implementRewrite. The agg operations on the 
source table columns in elementRewrite are rewritten as agg on the cube metrics 
columns, e.g., Max(BIG_REGION_NAME) will be rewritten as 
MAX("_KY_MAX_LT859_BIG_REGION_NAME") in this example.

!image-2023-12-11-14-36-45-263.png!

Then go through the rewritten FieldName to get the index of that Measure, and 
finally encapsulate it into a new AggregateCall to return.

!image-2023-12-11-14-37-08-556.png!

Notice in the above figure getInput().getRowType(), the RowType stores the 
content as the same metric column as the query field in index under the 
selected model
 # If all the layouts under the current model do not have the metrics used in 
the query, then there will be no rewritten Field in the RowType, so the Agg 
that misses the label dim_as_measure goes to here and gets the Field as Null.
 # If there is a metric in the query, then there is no problem here.

 

*2. If a Measure does not exist on the current Layout (e.g., the aggregation 
group does not contain a MAX(BIG_REGION_NAME metric) that matches this 
Aggregate, then the Aggregate will not match the Measure.*

After the first code, we can see that due to the de-duplicate operation, only 
one of the two Aggs will be retained, and since there is no Max Measure on the 
current layout, this Agg will go to *_applyDimAsMeasureStrategy_* logic, and 
then it will be labeled as {*}_DimensionAsMeasure_{*}, and the other Agg will 
go to rewrite logic, i.e. _KY_Rewrite_Filed will not exist in the field list of 
RowType. One Agg will go to rewrite logic, through the above analysis of 
RowType, it can be found that at this time there will be no filed after rewrite 
in the field list of RowType, i.e., _KY_MAX_LT859_BIG_REGION_NAME, so the 
second Agg will go to rewrite logic when it goes to rewrite logic. When the 
second Agg goes to rewrite logic, it will report NEP because it can't find the 
filed in the field list.

 

 

 
 

> When the query includes multiple aggregate functions, and the measure 
> corresponding to the aggregate functions is not in the aggregate index, the 
> query may report an error
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KYLIN-5740
>                 URL: https://issues.apache.org/jira/browse/KYLIN-5740
>             Project: Kylin
>          Issue Type: Bug
>    Affects Versions: 5.0-beta
>            Reporter: zhong.zhu
>            Assignee: zhong.zhu
>            Priority: Critical
>             Fix For: 5.0.0
>
>         Attachments: image-2023-12-11-14-25-14-409.png, 
> image-2023-12-11-14-32-11-954.png, image-2023-12-11-14-36-45-263.png, 
> image-2023-12-11-14-37-08-556.png
>
>
> Reproduction method:
> Table Building Statements
> {code:sql}
> CREATE EXTERNAL TABLE LT859 (
>     `PRO_MT_SEASON` varchar(4096),
>     `ARRIVAL_TRANSIT_IN_PRM_AMT` decimal(18,4),
>     `BIG_REGION_NAME` varchar(4096),
>     `REGION_NAME` varchar(4096)
> ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
> STORED AS TEXTFILE;LOAD DATA INPATH '/path/to/data/LT859.csv' OVERWRITE INTO 
> TABLE LT859;
> {code}
> Data（LT859.csv）
> {code:bash}
> 22Q1,150.12,东区,华东区
> 22Q2,200.59,中区,华中区
> 22Q3,301.55,上海区,华东区
> {code}
> Select SQL:
> {code:sql}
> select
>   max(BIG_REGION_NAME) "max1",
>   max(
>     case when 1=1 then "BIG_REGION_NAME"
>     end
>   ) "max2"
> from XXX.LT859
> {code}
> Create model, dimension 4 columns selected, add metrics MAX(BIG_REGION_NAME)
> Save the model and delete the base aggregation index, manually add an 
> aggregation group as a replacement for the base aggregation, but it can't 
> have the MAX(BIG_REGION_NAME) metric in it to mimic the dim as measure 
> scenario, and build the full amount.
> Using the above select sql query, you can see that NPE is reported in the 
> background:
> !image-2023-12-11-14-25-14-409.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KYLIN-5740) When the query includes multiple aggregate functions, and the measure corresponding to the aggregate functions is not in the aggregate index, the query may report an error

Reply via email to