nathanb9 opened a new pull request, #22675:
URL: https://github.com/apache/datafusion/pull/22675

   ## Which issue does this PR close?
   
   - Partially addresses #17737
   
   ## Rationale for this change
   
   Multi-referenced CTEs currently recompute their body for each reference. 
This PR adds infrastructure to compute them once, cache results, and share 
across all consumers.
   
   ## What changes are included in this PR?
   
   Introduces CTE materialization with the following constructs:
   
   ### Logical Nodes (`datafusion-expr`)
   ```rust
   pub struct MaterializedCteProducer {
       pub name: String,
       pub cte_plan: Arc<LogicalPlan>,
       pub continuation: Arc<LogicalPlan>,
       pub schema: DFSchemaRef,
       pub force_materialized: bool,
   }
   
   pub struct MaterializedCteReader {
       pub name: String,
       pub schema: DFSchemaRef,
   }
   ```
   
   ### Physical Operators (`datafusion-physical-plan`)
   ```rust
   pub struct MaterializedCteCache {
       name: String,
       once: OnceAsync<Vec<Vec<RecordBatch>>>,
   }
   
   pub struct MaterializedCteExec { ... }      // materializes + runs 
continuation
   pub struct MaterializedCteReaderExec { ... } // reads from shared cache
   ```
   
   ### Extension Planner (`datafusion-core`)
   ```rust
   pub struct MaterializedCtePlanner { ... }   // bridges logical → physical
   ```
   
   ### SQL Planner
   - Wraps all multi-ref CTEs in Producer/Reader nodes when 
`enable_materialized_ctes = true`
   - Skips cheap non-volatile CTEs (literals, empty relations)
   - Respects `MATERIALIZED` / `NOT MATERIALIZED` SQL hints
   
   ### Config
   ```
   datafusion.execution.enable_materialized_ctes = false (default, opt-in for 
now)
   ```
   
   **Feature is disabled by default** for this initial PR. Follow-up PRs will 
add:
   - `InlineCte` optimizer rule (smart inlining heuristic)
   - `CteFilterPusher` optimizer rule (OR-combined filter pushdown)
   - MemoryPool integration
   - Then enable by default
   
   ## Are these changes tested?
   
   Yes. Integration tests cover materialization, partition preservation, cache 
isolation, volatile function semantics, and statistics propagation.
   
   ## Are there any user-facing changes?
   
   Yes. New config flag `datafusion.execution.enable_materialized_ctes` and SQL 
hint support (`AS MATERIALIZED` / `AS NOT MATERIALIZED`). Disabled by default.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to