[ 
https://issues.apache.org/jira/browse/CALCITE-7551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Darpan Lunagariya (e6data computing) updated CALCITE-7551:
----------------------------------------------------------
    Description: 
h2. Description

Several optimizer and RelBuilder-level transformations inline a 
lower-projection expression into multiple positions of an upper expression 
without checking whether the inlined expression is deterministic. When the 
lower expression
is non-deterministic (e.g. RAND(), CURRENT_TIMESTAMP, sequence NEXTVAL, UUID 
generators), this silently converts one evaluation into many, changing query 
semantics.
h2. Minimal repro
{code:sql}
SELECT a, a + 1 AS b
FROM (SELECT rand() AS a)
{code}
Current plan produced by SqlToRelConverter:
{noformat}
LogicalProject(A=[RAND()], B=[+(RAND(), 1)])
LogicalValues(tuples=[[{ 0 }]])
{noformat}
The two RAND() calls are evaluated independently, so B - A is no longer always 
{{{}1{}}}. The expected plan preserves the single evaluation by referencing the 
projected column:
{noformat}
LogicalProject(A=[$0], B=[+($0, 1)])
LogicalProject(A=[RAND()])
    LogicalValues(tuples=[[{ 0 }]])
{noformat}

h2. Affected code paths (confirmed by reproducer tests)


||Heading 1||Heading 2||
|||Component||Mechanism||
│{{RelOptUtil.pushPastProjectUnlessBloat}}│The shuttle returned by 
{{pushShuttle(Project)}} substitutes every {{RexInputRef}} with the underlying 
projection. Only {{RexOver}} is guarded; determinism is not.|
│{{RelBuilder.project_}} (bloat merge)│Calls {{pushPastProjectUnlessBloat}} 
when merging adjacent {{Project}}s. This is why the bug surfaces in the very 
first {{RelNode}} produced by {{SqlToRelConverter}}, before any rule fires.|
│{{ProjectMergeRule}}│Calls {{pushPastProjectUnlessBloat}}. Test: 1 → 2 
{{RAND()}} calls.|
│{{FilterProjectTransposeRule}}│Calls {{pushPastProjectUnlessBloat}} on the 
filter condition, and re-emits the original projection above the new filter. 
Even with the helper guarded, the rebuilt {{Project(rand() AS a)}} on top of 
{{Filter(rand() > ...)}} splits one evaluation into two. Test: 1 → 2.|
│{{JoinProjectTransposeRule}}│Builds {{RexProgram}}s for projection + join 
condition, merges them via {{RexProgramBuilder.mergePrograms}}, then calls 
{{mergedProgram.expandLocalRef(condition)}}. The expansion flattens local refs 
back into a {{RexNode}} tree, inlining the underlying expression at every 
occurrence. Test: 1 → 3.||Col A2|


Confirmed unaffected (by the same tests)
 - {{CalcMergeRule}} — {{RexProgram}} represents intermediate values as 
{{RexLocalRef}}s, so the merged program contains a single {{expr#N=[RAND()]}} 
referenced multiple times. Sharing is preserved; useful property to think about
when designing a general fix.
 - {{ProjectReduceExpressionsRule}} — refuses to fold non-deterministic terms.
 - {{ProjectValuesMergeRule}} / {{ValuesReduceRule}} — does not fire when the 
projected expression is non-reducible.

h2. Proposed fix

Three small changes, all guarding the existing inline-substitution paths 
against duplication of non-deterministic sub-expressions:

{{RelOptUtil.pushPastProjectUnlessBloat}} — count {{RexInputRef}} uses per 
bottom slot; if any bottom expression is non-deterministic 
({{{}!RexUtil.isDeterministic(...){}}}) and is referenced more than once, 
return {{{}null{}}}. Fixes 
{{{}ProjectMergeRule{}}}, {{{}FilterProjectTransposeRule{}}}'s helper call, and 
the {{RelBuilder}} bloat merge in one place.

{{FilterProjectTransposeRule}} — additionally skip when any projected 
expression is non-deterministic, parallel to the existing {{containsOver}} 
skip. Required because the rule re-emits the original projection above the new 
filter.

{{JoinProjectTransposeRule}} — skip non-deterministic projections, parallel to 
the existing {{containsOver}} skip on each side.
h2. Reproducer

A new test class {{NonDeterministicDuplicationTest}} builds each rule's input 
programmatically via {{RelBuilder}} (with {{bloat = -1}} so the build-time 
merge does not pre-flatten), applies a single rule via a {{{}HepPlanner{}}}, and
asserts the count of {{RAND(}} occurrences in the output plan does not increase.


||Heading 1||Heading 2||
|Col A1|Col A2|


  was:
h2. Description

Several optimizer and RelBuilder-level transformations inline a 
lower-projection expression into multiple positions of an upper expression 
without checking whether the inlined expression is deterministic. When the 
lower expression
is non-deterministic (e.g. RAND(), CURRENT_TIMESTAMP, sequence NEXTVAL, UUID 
generators), this silently converts one evaluation into many, changing query 
semantics.
h2. Minimal repro
{code:sql}
SELECT a, a + 1 AS b
FROM (SELECT rand() AS a)
{code}
Current plan produced by SqlToRelConverter:
{noformat}
LogicalProject(A=[RAND()], B=[+(RAND(), 1)])
LogicalValues(tuples=[[{ 0 }]])
{noformat}
The two RAND() calls are evaluated independently, so B - A is no longer always 
{{{}1{}}}. The expected plan preserves the single evaluation by referencing the 
projected column:
{noformat}
LogicalProject(A=[$0], B=[+($0, 1)])
LogicalProject(A=[RAND()])
    LogicalValues(tuples=[[{ 0 }]])
{noformat}

h2. Affected code paths (confirmed by reproducer tests)

||Component||Mechanism||
│{{RelOptUtil.pushPastProjectUnlessBloat}}│The shuttle returned by 
{{pushShuttle(Project)}} substitutes every {{RexInputRef}} with the underlying 
projection. Only {{RexOver}} is guarded; determinism is not.|

│{{RelBuilder.project_}} (bloat merge)│Calls {{pushPastProjectUnlessBloat}} 
when merging adjacent {{Project}}s. This is why the bug surfaces in the very 
first {{RelNode}} produced by {{SqlToRelConverter}}, before any rule fires.|
│{{ProjectMergeRule}}│Calls {{pushPastProjectUnlessBloat}}. Test: 1 → 2 
{{RAND()}} calls.|
│{{FilterProjectTransposeRule}}│Calls {{pushPastProjectUnlessBloat}} on the 
filter condition, and re-emits the original projection above the new filter. 
Even with the helper guarded, the rebuilt {{Project(rand() AS a)}} on top of 
{{Filter(rand() > ...)}} splits one evaluation into two. Test: 1 → 2.|
│{{JoinProjectTransposeRule}}│Builds {{RexProgram}}s for projection + join 
condition, merges them via {{RexProgramBuilder.mergePrograms}}, then calls 
{{mergedProgram.expandLocalRef(condition)}}. The expansion flattens local refs 
back into a {{RexNode}} tree, inlining the underlying expression at every 
occurrence. Test: 1 → 3.|

Confirmed unaffected (by the same tests)
 - {{CalcMergeRule}} — {{RexProgram}} represents intermediate values as 
{{RexLocalRef}}s, so the merged program contains a single {{expr#N=[RAND()]}} 
referenced multiple times. Sharing is preserved; useful property to think about
when designing a general fix.
 - {{ProjectReduceExpressionsRule}} — refuses to fold non-deterministic terms.
 - {{ProjectValuesMergeRule}} / {{ValuesReduceRule}} — does not fire when the 
projected expression is non-reducible.

h2. Proposed fix

Three small changes, all guarding the existing inline-substitution paths 
against duplication of non-deterministic sub-expressions:

{{RelOptUtil.pushPastProjectUnlessBloat}} — count {{RexInputRef}} uses per 
bottom slot; if any bottom expression is non-deterministic 
({{{}!RexUtil.isDeterministic(...){}}}) and is referenced more than once, 
return {{{}null{}}}. Fixes 
{{{}ProjectMergeRule{}}}, {{{}FilterProjectTransposeRule{}}}'s helper call, and 
the {{RelBuilder}} bloat merge in one place.

{{FilterProjectTransposeRule}} — additionally skip when any projected 
expression is non-deterministic, parallel to the existing {{containsOver}} 
skip. Required because the rule re-emits the original projection above the new 
filter.

{{JoinProjectTransposeRule}} — skip non-deterministic projections, parallel to 
the existing {{containsOver}} skip on each side.
h2. Reproducer

A new test class {{NonDeterministicDuplicationTest}} builds each rule's input 
programmatically via {{RelBuilder}} (with {{bloat = -1}} so the build-time 
merge does not pre-flatten), applies a single rule via a {{{}HepPlanner{}}}, and
asserts the count of {{RAND(}} occurrences in the output plan does not increase.


||Heading 1||Heading 2||
|Col A1|Col A2|



> Project/Filter/Join transpose and merge rules can duplicate non-deterministic 
> expressions (e.g. RAND())
> -------------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-7551
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7551
>             Project: Calcite
>          Issue Type: Bug
>    Affects Versions: 1.41.0
>            Reporter: Darpan Lunagariya (e6data computing)
>            Assignee: Darpan Lunagariya (e6data computing)
>            Priority: Major
>
> h2. Description
> Several optimizer and RelBuilder-level transformations inline a 
> lower-projection expression into multiple positions of an upper expression 
> without checking whether the inlined expression is deterministic. When the 
> lower expression
> is non-deterministic (e.g. RAND(), CURRENT_TIMESTAMP, sequence NEXTVAL, UUID 
> generators), this silently converts one evaluation into many, changing query 
> semantics.
> h2. Minimal repro
> {code:sql}
> SELECT a, a + 1 AS b
> FROM (SELECT rand() AS a)
> {code}
> Current plan produced by SqlToRelConverter:
> {noformat}
> LogicalProject(A=[RAND()], B=[+(RAND(), 1)])
> LogicalValues(tuples=[[{ 0 }]])
> {noformat}
> The two RAND() calls are evaluated independently, so B - A is no longer 
> always {{{}1{}}}. The expected plan preserves the single evaluation by 
> referencing the projected column:
> {noformat}
> LogicalProject(A=[$0], B=[+($0, 1)])
> LogicalProject(A=[RAND()])
>     LogicalValues(tuples=[[{ 0 }]])
> {noformat}
> h2. Affected code paths (confirmed by reproducer tests)
> ||Heading 1||Heading 2||
> |||Component||Mechanism||
> │{{RelOptUtil.pushPastProjectUnlessBloat}}│The shuttle returned by 
> {{pushShuttle(Project)}} substitutes every {{RexInputRef}} with the 
> underlying projection. Only {{RexOver}} is guarded; determinism is not.|
> │{{RelBuilder.project_}} (bloat merge)│Calls {{pushPastProjectUnlessBloat}} 
> when merging adjacent {{Project}}s. This is why the bug surfaces in the very 
> first {{RelNode}} produced by {{SqlToRelConverter}}, before any rule fires.|
> │{{ProjectMergeRule}}│Calls {{pushPastProjectUnlessBloat}}. Test: 1 → 2 
> {{RAND()}} calls.|
> │{{FilterProjectTransposeRule}}│Calls {{pushPastProjectUnlessBloat}} on the 
> filter condition, and re-emits the original projection above the new filter. 
> Even with the helper guarded, the rebuilt {{Project(rand() AS a)}} on top of 
> {{Filter(rand() > ...)}} splits one evaluation into two. Test: 1 → 2.|
> │{{JoinProjectTransposeRule}}│Builds {{RexProgram}}s for projection + join 
> condition, merges them via {{RexProgramBuilder.mergePrograms}}, then calls 
> {{mergedProgram.expandLocalRef(condition)}}. The expansion flattens local 
> refs back into a {{RexNode}} tree, inlining the underlying expression at 
> every occurrence. Test: 1 → 3.||Col A2|
> Confirmed unaffected (by the same tests)
>  - {{CalcMergeRule}} — {{RexProgram}} represents intermediate values as 
> {{RexLocalRef}}s, so the merged program contains a single {{expr#N=[RAND()]}} 
> referenced multiple times. Sharing is preserved; useful property to think 
> about
> when designing a general fix.
>  - {{ProjectReduceExpressionsRule}} — refuses to fold non-deterministic terms.
>  - {{ProjectValuesMergeRule}} / {{ValuesReduceRule}} — does not fire when the 
> projected expression is non-reducible.
> h2. Proposed fix
> Three small changes, all guarding the existing inline-substitution paths 
> against duplication of non-deterministic sub-expressions:
> {{RelOptUtil.pushPastProjectUnlessBloat}} — count {{RexInputRef}} uses per 
> bottom slot; if any bottom expression is non-deterministic 
> ({{{}!RexUtil.isDeterministic(...){}}}) and is referenced more than once, 
> return {{{}null{}}}. Fixes 
> {{{}ProjectMergeRule{}}}, {{{}FilterProjectTransposeRule{}}}'s helper call, 
> and the {{RelBuilder}} bloat merge in one place.
> {{FilterProjectTransposeRule}} — additionally skip when any projected 
> expression is non-deterministic, parallel to the existing {{containsOver}} 
> skip. Required because the rule re-emits the original projection above the 
> new filter.
> {{JoinProjectTransposeRule}} — skip non-deterministic projections, parallel 
> to the existing {{containsOver}} skip on each side.
> h2. Reproducer
> A new test class {{NonDeterministicDuplicationTest}} builds each rule's input 
> programmatically via {{RelBuilder}} (with {{bloat = -1}} so the build-time 
> merge does not pre-flatten), applies a single rule via a {{{}HepPlanner{}}}, 
> and
> asserts the count of {{RAND(}} occurrences in the output plan does not 
> increase.
> ||Heading 1||Heading 2||
> |Col A1|Col A2|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to