[
https://issues.apache.org/jira/browse/CALCITE-4636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356691#comment-17356691
]
Rafay A edited comment on CALCITE-4636 at 6/4/21, 10:13 PM:
------------------------------------------------------------
[~julianhyde]: The reason this is happening is because when we create the
RelCollation, we try to find the node in the group expressions first, which
returns the reference in the current group set, but if the field is present in
the group set, it may or may not match with the field index in the input node,
hence the mismatch and then we wrongly add a new expression in
lookupOrCreateGroupExpr(...) method.
My solution is to disable the agg mode, which will force the
convertExpression(...) function to return the correct RexNode without looking
in the grouping set.
was (Author: rafay):
[~julianhyde]: The reason this is happening is because when we create the
RelCollation, we try to find the node in the group expressions first, which
returns the reference in the current group set, but if the field is present in
the group set, it may or may not match with the field index in the input node,
hence the mismatch and then we wrongly add a new expression in
lookupOrCreateGroupExpr(...) method.
My solution is to disable the agg mode, which will force the
convertExpression(...) function to return the correct RexNode.
> Wrong RelCollation if Group by field appears in Order by clause
> ---------------------------------------------------------------
>
> Key: CALCITE-4636
> URL: https://issues.apache.org/jira/browse/CALCITE-4636
> Project: Calcite
> Issue Type: Bug
> Components: core
> Reporter: Rafay A
> Priority: Major
> Labels: pull-request-available
> Time Spent: 20m
> Remaining Estimate: 0h
>
> If we run this query:
> {code:java}
> select regionkey, collect(regionkey)
> within group (order by regionkey)
> from (SELECT * FROM (VALUES('blah','blah','blah',1,1)) as
> TBL(f1,f2,f3,nationkey,regionkey))
> group by regionkey
> {code}
>
> We see that we are ordering by the same field as the group by field
> (regionkey). Now the query may be non-sensicle as there is no point of
> collecting the same field with in the same group, but syntactically its a
> valid SQL and should produce correct plan, but it generates this plan instead:
> {code:java}
> LogicalAggregate(group=[{0}], EXPR$1=[COLLECT($0) WITHIN GROUP ([1])])
> LogicalProject(REGIONKEY=[$4], F1=[$0])
> LogicalProject(F1=[$0], F2=[$1], F3=[$2], NATIONKEY=[$3], REGIONKEY=[$4])
> LogicalValues(tuples=[[{ 'blah', 'blah', 'blah', 1, 1 }]])
> {code}
>
> If we run the following *good* query, it generates correct plan:
>
> {code:java}
> select nationkey, collect(regionkey)
> within group (order by regionkey)
> from (SELECT * FROM (VALUES('blah','blah','blah',1,1)) as
> TBL(f1,f2,f3,nationkey,regionkey))
> group by regionkey{code}
>
> Generated plan:
> {noformat}
> LogicalAggregate(group=[{0}], EXPR$1=[COLLECT($1) WITHIN GROUP ([1])])
> LogicalProject(NATIONKEY=[$3], REGIONKEY=[$4])
> LogicalProject(F1=[$0], F2=[$1], F3=[$2], NATIONKEY=[$3], REGIONKEY=[$4])
> LogicalValues(tuples=[[{ 'blah', 'blah', 'blah', 1, 1 }]]){noformat}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)