nathanb9 opened a new pull request, #22675:
URL: https://github.com/apache/datafusion/pull/22675
## Which issue does this PR close?
- Partially addresses #17737
## Rationale for this change
Multi-referenced CTEs currently recompute their body for each reference.
This PR adds infrastructure to compute them once, cache results, and share
across all consumers.
## What changes are included in this PR?
Introduces CTE materialization with the following constructs:
### Logical Nodes (`datafusion-expr`)
```rust
pub struct MaterializedCteProducer {
pub name: String,
pub cte_plan: Arc<LogicalPlan>,
pub continuation: Arc<LogicalPlan>,
pub schema: DFSchemaRef,
pub force_materialized: bool,
}
pub struct MaterializedCteReader {
pub name: String,
pub schema: DFSchemaRef,
}
```
### Physical Operators (`datafusion-physical-plan`)
```rust
pub struct MaterializedCteCache {
name: String,
once: OnceAsync<Vec<Vec<RecordBatch>>>,
}
pub struct MaterializedCteExec { ... } // materializes + runs
continuation
pub struct MaterializedCteReaderExec { ... } // reads from shared cache
```
### Extension Planner (`datafusion-core`)
```rust
pub struct MaterializedCtePlanner { ... } // bridges logical → physical
```
### SQL Planner
- Wraps all multi-ref CTEs in Producer/Reader nodes when
`enable_materialized_ctes = true`
- Skips cheap non-volatile CTEs (literals, empty relations)
- Respects `MATERIALIZED` / `NOT MATERIALIZED` SQL hints
### Config
```
datafusion.execution.enable_materialized_ctes = false (default, opt-in for
now)
```
**Feature is disabled by default** for this initial PR. Follow-up PRs will
add:
- `InlineCte` optimizer rule (smart inlining heuristic)
- `CteFilterPusher` optimizer rule (OR-combined filter pushdown)
- MemoryPool integration
- Then enable by default
## Are these changes tested?
Yes. Integration tests cover materialization, partition preservation, cache
isolation, volatile function semantics, and statistics propagation.
## Are there any user-facing changes?
Yes. New config flag `datafusion.execution.enable_materialized_ctes` and SQL
hint support (`AS MATERIALIZED` / `AS NOT MATERIALIZED`). Disabled by default.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]