[
https://issues.apache.org/jira/browse/KYLIN-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17795184#comment-17795184
]
zhong.zhu commented on KYLIN-5740:
----------------------------------
h1. Dev Design
Postponing the de-duplication logic and downgrading it to
applyDimAsMeasureStrategy ensures that if the model doesn't have a metric that
matches the query, every Agg after parsing has a chance of being marked as
DimAsMeasure so that it can hit the indexes
h1. Root Cause
!image-2023-12-11-14-32-11-954.png!
The set in the figure above will de-duplicate {*}_sqlDigest.aggregations_{*},
and the logic for determining whether the aggregations are the same is that as
long as the values of the attributes are the same, they can be determined to be
the same.
After a series of translations of the sql given in the description, two Aggs
will be generated, and the express and parameter of these two Aggs are the
same, calling equal true.
So after de-duplication, only one of the Aggs will be retained.
Next, the unmatchedAggregations method is used to match the Measure and AggFunc
on the Layout.
{*}1 . If there is a Measure on the current Layout (e.g., an aggregation group
that contains the MAX(BIG_REGION_NAME metric) that matches this Aggregate, the
Aggregate can be used as the Aggregate.{*}{*}{*}
The removed AggFunc will not be marked as {_}DimAsMeasure{_}. those Aggs that
are not marked as DimAsMeasure will be rewritten as agg for cube metrics
columns in the OLAPAggregateRel# implementRewrite. The agg operations on the
source table columns in elementRewrite are rewritten as agg on the cube metrics
columns, e.g., Max(BIG_REGION_NAME) will be rewritten as
MAX("_KY_MAX_LT859_BIG_REGION_NAME") in this example.
!image-2023-12-11-14-36-45-263.png!
Then go through the rewritten FieldName to get the index of that Measure, and
finally encapsulate it into a new AggregateCall to return.
!image-2023-12-11-14-37-08-556.png!
Notice in the above figure getInput().getRowType(), the RowType stores the
content as the same metric column as the query field in index under the
selected model
# If all the layouts under the current model do not have the metrics used in
the query, then there will be no rewritten Field in the RowType, so the Agg
that misses the label dim_as_measure goes to here and gets the Field as Null.
# If there is a metric in the query, then there is no problem here.
*2. If a Measure does not exist on the current Layout (e.g., the aggregation
group does not contain a MAX(BIG_REGION_NAME metric) that matches this
Aggregate, then the Aggregate will not match the Measure.*
After the first code, we can see that due to the de-duplicate operation, only
one of the two Aggs will be retained, and since there is no Max Measure on the
current layout, this Agg will go to *_applyDimAsMeasureStrategy_* logic, and
then it will be labeled as {*}_DimensionAsMeasure_{*}, and the other Agg will
go to rewrite logic, i.e. _KY_Rewrite_Filed will not exist in the field list of
RowType. One Agg will go to rewrite logic, through the above analysis of
RowType, it can be found that at this time there will be no filed after rewrite
in the field list of RowType, i.e., _KY_MAX_LT859_BIG_REGION_NAME, so the
second Agg will go to rewrite logic when it goes to rewrite logic. When the
second Agg goes to rewrite logic, it will report NEP because it can't find the
filed in the field list.
> When the query includes multiple aggregate functions, and the measure
> corresponding to the aggregate functions is not in the aggregate index, the
> query may report an error
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: KYLIN-5740
> URL: https://issues.apache.org/jira/browse/KYLIN-5740
> Project: Kylin
> Issue Type: Bug
> Affects Versions: 5.0-beta
> Reporter: zhong.zhu
> Assignee: zhong.zhu
> Priority: Critical
> Fix For: 5.0.0
>
> Attachments: image-2023-12-11-14-25-14-409.png,
> image-2023-12-11-14-32-11-954.png, image-2023-12-11-14-36-45-263.png,
> image-2023-12-11-14-37-08-556.png
>
>
> Reproduction method:
> Table Building Statements
> {code:sql}
> CREATE EXTERNAL TABLE LT859 (
> `PRO_MT_SEASON` varchar(4096),
> `ARRIVAL_TRANSIT_IN_PRM_AMT` decimal(18,4),
> `BIG_REGION_NAME` varchar(4096),
> `REGION_NAME` varchar(4096)
> ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
> STORED AS TEXTFILE;LOAD DATA INPATH '/path/to/data/LT859.csv' OVERWRITE INTO
> TABLE LT859;
> {code}
> Data(LT859.csv)
> {code:bash}
> 22Q1,150.12,东区,华东区
> 22Q2,200.59,中区,华中区
> 22Q3,301.55,上海区,华东区
> {code}
> Select SQL:
> {code:sql}
> select
> max(BIG_REGION_NAME) "max1",
> max(
> case when 1=1 then "BIG_REGION_NAME"
> end
> ) "max2"
> from XXX.LT859
> {code}
> Create model, dimension 4 columns selected, add metrics MAX(BIG_REGION_NAME)
> Save the model and delete the base aggregation index, manually add an
> aggregation group as a replacement for the base aggregation, but it can't
> have the MAX(BIG_REGION_NAME) metric in it to mimic the dim as measure
> scenario, and build the full amount.
> Using the above select sql query, you can see that NPE is reported in the
> background:
> !image-2023-12-11-14-25-14-409.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)