lincoln lee created FLINK-39720:
-----------------------------------
Summary: SubQueryDecorrelator produces incorrect plans for
correlated EXISTS with HAVING on aggregate outputs
Key: FLINK-39720
URL: https://issues.apache.org/jira/browse/FLINK-39720
Project: Flink
Issue Type: Bug
Components: Table SQL / Planner
Affects Versions: 2.2.1, 1.20.4, 2.3.0
Reporter: lincoln lee
Assignee: lincoln lee
SubQueryDecorrelator.decorrelateRel(LogicalFilter) reattaches the
non-correlated remainder of a Filter condition to the rewritten input without
remapping its
RexInputRefs through frame.oldToNewOutputs. When the child LogicalAggregate
has had correlated columns injected into its group key (which shifts the
position of
aggregate-output fields), surviving HAVING / Filter predicates silently point
at the wrong column. The resulting plan is structurally valid but semantically
wrong.
Reproduction
Schema (matches SubQuerySemiJoinTest): l(a INT, b BIGINT, c VARCHAR), r(d
INT, e BIGINT, f VARCHAR).
SELECT * FROM l
WHERE EXISTS (
SELECT 1 FROM r
WHERE l.a = r.d -- correlated WHERE
GROUP BY r.f
HAVING SUM(r.e) >= 3 -- non-correlated HAVING on aggregate output
);
Expected: HAVING applies to the SUM(r.e) column.
Actual (before fix): HAVING applies to the injected r.d group-key column
(>=($1, 3) where $1 is now r.d, not SUM(r.e)). Plan is silently wrong.
Other shapes that trigger the same drift:
- Compound HAVING: HAVING SUM(r.e) >= 3 AND MAX(r.e) < 100
- Mixed agg + COUNT: HAVING SUM(r.e) >= 3 AND COUNT(*) > 1
- Multiple correlated cols: WHERE l.a = r.d AND l.b = r.e ... HAVING
COUNT(r.d) >= 2
--
This message was sent by Atlassian Jira
(v8.20.10#820010)