Darpan Lunagariya (e6data computing) created CALCITE-7551:
-------------------------------------------------------------
Summary: Project/Filter/Join transpose and merge rules can
duplicate non-deterministic expressions (e.g. RAND())
Key: CALCITE-7551
URL: https://issues.apache.org/jira/browse/CALCITE-7551
Project: Calcite
Issue Type: Bug
Affects Versions: 1.41.0
Reporter: Darpan Lunagariya (e6data computing)
Assignee: Darpan Lunagariya (e6data computing)
h2. Description
Several optimizer and RelBuilder-level transformations inline a
lower-projection expression into multiple positions of an upper expression
without checking whether the inlined expression is deterministic. When the
lower expression
is non-deterministic (e.g. RAND(), CURRENT_TIMESTAMP, sequence NEXTVAL, UUID
generators), this silently converts one evaluation into many, changing query
semantics.
h2. Minimal repro
{code:sql}
SELECT a, a + 1 AS b
FROM (SELECT rand() AS a)
{code}
Current plan produced by SqlToRelConverter:
{noformat}
LogicalProject(A=[RAND()], B=[+(RAND(), 1)])
LogicalValues(tuples=[[{ 0 }]])
{noformat}
The two RAND() calls are evaluated independently, so B - A is no longer always
{{1}}. The expected plan preserves the single evaluation by referencing the
projected column:
{noformat}
LogicalProject(A=[$0], B=[+($0, 1)])
LogicalProject(A=[RAND()])
LogicalValues(tuples=[[{ 0 }]])
{noformat}
Affected code paths (confirmed by reproducer tests)
┌───────────────────────────────────────────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Component │
Mechanism
│
├───────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ {{RelOptUtil.pushPastProjectUnlessBloat}} │ The shuttle returned by
{{pushShuttle(Project)}} substitutes every {{RexInputRef}} with the underlying
projection. Only {{RexOver}} is guarded; determinism is not. │
├───────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ {{RelBuilder.project_}} (bloat merge) │ Calls
{{pushPastProjectUnlessBloat}} when merging adjacent {{Project}}s. This is why
the bug surfaces in the very first {{RelNode}} produced by
{{SqlToRelConverter}}, before any │
│ │ rule fires.
│
├───────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ {{ProjectMergeRule}} │ Calls
{{pushPastProjectUnlessBloat}}. Test: 1 → 2 {{RAND()}} calls.
│
├───────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ {{FilterProjectTransposeRule}} │ Calls
{{pushPastProjectUnlessBloat}} on the filter condition, and re-emits the
original projection above the new filter. Even with the helper guarded, the
rebuilt {{Project(rand() │
│ │ AS a)}} on top of {{Filter(rand()
> ...)}} splits one evaluation into two. Test: 1 → 2.
│
├───────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ {{JoinProjectTransposeRule}} │ Builds {{RexProgram}}s for
projection + join condition, merges them via
{{RexProgramBuilder.mergePrograms}}, then calls
{{mergedProgram.expandLocalRef(condition)}}. The expansion │
│ │ flattens local refs back into a
{{RexNode}} tree, inlining the underlying expression at every occurrence. Test:
1 → 3. │
└───────────────────────────────────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Confirmed unaffected (by the same tests)
- {{CalcMergeRule}} — {{RexProgram}} represents intermediate values as
{{RexLocalRef}}s, so the merged program contains a single {{expr#N=[RAND()]}}
referenced multiple times. Sharing is preserved; useful property to think about
when designing a general fix.
- {{ProjectReduceExpressionsRule}} — refuses to fold non-deterministic terms.
- {{ProjectValuesMergeRule}} / {{ValuesReduceRule}} — does not fire when the
projected expression is non-reducible.
h2. Proposed fix
Three small changes, all guarding the existing inline-substitution paths
against duplication of non-deterministic sub-expressions:
{{RelOptUtil.pushPastProjectUnlessBloat}} — count {{RexInputRef}} uses per
bottom slot; if any bottom expression is non-deterministic
({{!RexUtil.isDeterministic(...)}}) and is referenced more than once, return
{{null}}. Fixes
{{ProjectMergeRule}}, {{FilterProjectTransposeRule}}'s helper call, and the
{{RelBuilder}} bloat merge in one place.
{{FilterProjectTransposeRule}} — additionally skip when any projected
expression is non-deterministic, parallel to the existing {{containsOver}}
skip. Required because the rule re-emits the original projection above the new
filter.
{{JoinProjectTransposeRule}} — skip non-deterministic projections, parallel to
the existing {{containsOver}} skip on each side.
h2. Reproducer
A new test class {{NonDeterministicDuplicationTest}} builds each rule's input
programmatically via {{RelBuilder}} (with {{bloat = -1}} so the build-time
merge does not pre-flatten), applies a single rule via a {{HepPlanner}}, and
asserts the count of {{RAND(}} occurrences in the output plan does not increase.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)