Darpan Lunagariya (e6data computing) created CALCITE-7551:
-------------------------------------------------------------

             Summary: Project/Filter/Join transpose and merge rules can 
duplicate non-deterministic expressions (e.g. RAND())
                 Key: CALCITE-7551
                 URL: https://issues.apache.org/jira/browse/CALCITE-7551
             Project: Calcite
          Issue Type: Bug
    Affects Versions: 1.41.0
            Reporter: Darpan Lunagariya (e6data computing)
            Assignee: Darpan Lunagariya (e6data computing)


h2. Description 
   
Several optimizer and RelBuilder-level transformations inline a 
lower-projection expression into multiple positions of an upper expression 
without checking whether the inlined expression is deterministic. When the 
lower expression
is non-deterministic (e.g. RAND(), CURRENT_TIMESTAMP, sequence NEXTVAL, UUID 
generators), this silently converts one evaluation into many, changing query 
semantics.

h2. Minimal repro

{code:sql}
SELECT a, a + 1 AS b
FROM (SELECT rand() AS a)
{code}

Current plan produced by SqlToRelConverter:

{noformat}
LogicalProject(A=[RAND()], B=[+(RAND(), 1)])
LogicalValues(tuples=[[{ 0 }]])
{noformat}

The two RAND() calls are evaluated independently, so B - A is no longer always 
{{1}}. The expected plan preserves the single evaluation by referencing the 
projected column:

{noformat}
LogicalProject(A=[$0], B=[+($0, 1)])
LogicalProject(A=[RAND()])
    LogicalValues(tuples=[[{ 0 }]])
{noformat}

Affected code paths (confirmed by reproducer tests)

┌───────────────────────────────────────────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                 Component                 │                                   
                                                    Mechanism                   
                                                                    │
├───────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ {{RelOptUtil.pushPastProjectUnlessBloat}} │ The shuttle returned by 
{{pushShuttle(Project)}} substitutes every {{RexInputRef}} with the underlying 
projection. Only {{RexOver}} is guarded; determinism is not.                   │
├───────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ {{RelBuilder.project_}} (bloat merge)     │ Calls 
{{pushPastProjectUnlessBloat}} when merging adjacent {{Project}}s. This is why 
the bug surfaces in the very first {{RelNode}} produced by 
{{SqlToRelConverter}}, before any     │
│                                           │ rule fires.                       
                                                                                
                                                                    │
├───────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ {{ProjectMergeRule}}                      │ Calls 
{{pushPastProjectUnlessBloat}}. Test: 1 → 2 {{RAND()}} calls.                   
                                                                                
                │
├───────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ {{FilterProjectTransposeRule}}            │ Calls 
{{pushPastProjectUnlessBloat}} on the filter condition, and re-emits the 
original projection above the new filter. Even with the helper guarded, the 
rebuilt {{Project(rand()   │
│                                           │ AS a)}} on top of {{Filter(rand() 
> ...)}} splits one evaluation into two. Test: 1 → 2.                           
                                                                    │
├───────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ {{JoinProjectTransposeRule}}              │ Builds {{RexProgram}}s for 
projection + join condition, merges them via 
{{RexProgramBuilder.mergePrograms}}, then calls 
{{mergedProgram.expandLocalRef(condition)}}. The expansion    │
│                                           │ flattens local refs back into a 
{{RexNode}} tree, inlining the underlying expression at every occurrence. Test: 
1 → 3.                                                                │
└───────────────────────────────────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Confirmed unaffected (by the same tests)

- {{CalcMergeRule}} — {{RexProgram}} represents intermediate values as 
{{RexLocalRef}}s, so the merged program contains a single {{expr#N=[RAND()]}} 
referenced multiple times. Sharing is preserved; useful property to think about
when designing a general fix.
- {{ProjectReduceExpressionsRule}} — refuses to fold non-deterministic terms.
- {{ProjectValuesMergeRule}} / {{ValuesReduceRule}} — does not fire when the 
projected expression is non-reducible.

h2. Proposed fix

Three small changes, all guarding the existing inline-substitution paths 
against duplication of non-deterministic sub-expressions:

{{RelOptUtil.pushPastProjectUnlessBloat}} — count {{RexInputRef}} uses per 
bottom slot; if any bottom expression is non-deterministic 
({{!RexUtil.isDeterministic(...)}}) and is referenced more than once, return 
{{null}}. Fixes 
{{ProjectMergeRule}}, {{FilterProjectTransposeRule}}'s helper call, and the 
{{RelBuilder}} bloat merge in one place.

{{FilterProjectTransposeRule}} — additionally skip when any projected 
expression is non-deterministic, parallel to the existing {{containsOver}} 
skip. Required because the rule re-emits the original projection above the new 
filter.

{{JoinProjectTransposeRule}} — skip non-deterministic projections, parallel to 
the existing {{containsOver}} skip on each side.

h2. Reproducer

A new test class {{NonDeterministicDuplicationTest}} builds each rule's input 
programmatically via {{RelBuilder}} (with {{bloat = -1}} so the build-time 
merge does not pre-flatten), applies a single rule via a {{HepPlanner}}, and
asserts the count of {{RAND(}} occurrences in the output plan does not increase.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to