nathanb9 opened a new pull request, #22551:
URL: https://github.com/apache/datafusion/pull/22551

   ## Summary
   
   Adds support for materializing Common Table Expressions (CTEs) that are 
referenced more than once. When enabled, multi-referenced CTEs ending in 
expensive operations (Aggregate, Distinct, Window, Union) are computed once and 
cached in memory for reuse.
   
   - Implements DuckDB-inspired heuristic: only materialize CTEs ending in 
expensive operations
   - Uses Extension nodes to avoid modifying core LogicalPlan enum
   - Handles nested CTE dependencies with correct execution ordering
   - Gated behind `enable_materialized_ctes` config (default: true)
   - Respects explicit `MATERIALIZED` / `NOT MATERIALIZED` SQL hints 
(PostgreSQL dialect)
   
   ## Benchmark Results (TPC-DS SF1, 10 iterations)
   
   | Query | Baseline | Materialized | Speedup |
   |-------|----------|--------------|---------|
   | Q47 | 401ms | 141ms | **2.85x** |
   | Q57 | 112ms | 42ms | **2.67x** |
   | Q2 | 101ms | 64ms | **1.58x** |
   | Q74 | 311ms | 164ms | **1.90x** |
   | Q75 | 192ms | 164ms | **1.17x** |
   
   Known limitation: CTEs where the outer query filters on different grouping 
key values per reference (e.g., TPC-DS Q39) may regress. Users can opt out with 
`NOT MATERIALIZED`.
   
   ## Test plan
   - [x] Unit tests for materialization logic (7 tests in sql_integration)
   - [x] All existing CTE tests pass (recursive CTEs unaffected)
   - [x] TPC-DS SF1 full suite (98/99 queries pass, Q30 has pre-existing schema 
error)
   - [x] Verified no regressions on Q64 (dependency ordering)
   
   Closes https://github.com/apache/datafusion/issues/17737


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to