adriangb opened a new pull request, #22239:
URL: https://github.com/apache/datafusion/pull/22239

   ## Which issue does this PR close?
   
   There is no dedicated issue for this change.
   
   ## Rationale for this change
   
   Constructing a struct with `named_struct(...)` (or `struct(...)`) and then 
immediately reading a field back out of it is pure overhead — the intermediate 
struct never needs to be materialized. This pattern shows up after view/CTE 
inlining and projection pushdown, where a `named_struct` projection feeds a 
`get_field` in a parent node.
   
   `get_field(named_struct('min', a, 'max', b), 'max')` is equivalent to `b`, 
so the simplifier can drop the struct entirely.
   
   ## What changes are included in this PR?
   
   A new logical simplification, added to `GetFieldFunc::simplify` (the same 
hook that already flattens nested `get_field` calls):
   
   - `get_field(named_struct('min', a, 'max', b), 'max')` => `b` (lookup by 
name)
   - `get_field(struct(a, b), 'c1')` => `b` (positional `c0`, `c1`, ... fields)
   - nested constructors collapse all the way through, e.g. 
`named_struct('outer', named_struct('inner', a))['outer']['inner']` => `a`
   
   The rewrite is conservative and bails out (leaving the expression untouched) 
whenever it cannot be proven safe:
   
   - a non-literal field key,
   - a `named_struct` with a non-literal field name (which could shadow the 
requested field at runtime),
   - a field the constructor does not produce,
   - non-canonical `struct` field spellings such as `c01`.
   
   Casts are intentionally **not** unwrapped: a struct→struct cast can rename, 
retype and reorder fields, so resolving through one correctly is a larger, 
separate change.
   
   ## Are these changes tested?
   
   Yes:
   
   - Unit tests in `getfield.rs` covering matches, duplicate names, nested 
constructors, flatten-then-resolve, and every bail-out guard.
   - `struct.slt`: query + `EXPLAIN` tests showing the field access collapses 
to the underlying column.
   - `order.slt`: two `EXPLAIN` expectations updated — resolving 
`get_field(named_struct(...), 'a')` lets the `extract_leaf_expressions` rule 
skip a now-pointless sort-key extraction. The `SortExec` is still present, as 
those tests intend, and the sort-elimination cases are unchanged.
   
   The full `sqllogictest` suite, the optimizer crate tests, `cargo clippy 
--all-targets --all-features -D warnings` and `cargo fmt` all pass.
   
   ## Are there any user-facing changes?
   
   No API changes. Query plans involving `get_field` over an inline struct 
constructor are simpler; results are unchanged.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to