adriangb opened a new issue, #22895:
URL: https://github.com/apache/datafusion/issues/22895
### Describe the bug
When a join key contains a `MoveTowardsLeafNodes` expression (e.g.
`get_field`), `ExtractLeafExpressions` rewrites the key into a
`__datafusion_extracted_N` column, and a subsequent `optimize_projections` pass
fails with `Schema error: No field named <column>` for an unrelated column of
the same table. Reproduces on current main (tested at `883c38ee`) with default
settings.
### To Reproduce
```sql
CREATE TABLE rt AS SELECT * FROM (VALUES
(named_struct('uid','u1','t','t1'), TIMESTAMP '2026-06-08T10:00:00', 'a'),
(named_struct('uid','u2','t','t2'), TIMESTAMP '2026-06-08T11:00:00', 'b'))
v(attributes, start_timestamp, span_name);
SELECT r.start_timestamp
FROM rt r JOIN (SELECT attributes['uid'] AS uid FROM rt) f ON f.uid =
r.attributes['uid']
WHERE r.attributes['t'] IN (SELECT attributes['t'] FROM rt);
```
```
Optimizer rule 'optimize_projections' failed
caused by
Schema error: No field named r.start_timestamp.
```
A larger variant of the same query (extra columns, mark-join from the `IN`
subquery) shows what the schema has degraded to:
```
Schema error: No field named r.start_timestamp.
Valid fields are r.__datafusion_extracted_3, r.attributes, f.uid,
f.first_time, __correlated_sq_1.mark.
```
The subquery is not required — a self-join on `get_field` keys plus any
`get_field` in `WHERE` also fails:
```sql
CREATE TABLE rt2 AS SELECT * FROM (VALUES
(named_struct('msg','user auth failed','sid','a'), 1, 'svc1'),
(named_struct('msg','login token','sid','b'), 2, 'svc2')) v(attributes,
id, name);
SELECT a.id, b.name
FROM rt2 a JOIN rt2 b ON a.attributes['sid'] = b.attributes['sid']
WHERE a.attributes['msg'] LIKE '%auth%';
```
```
Optimizer rule 'optimize_projections' failed
caused by
Schema error: No field named b.name.
Valid fields are a.id, a.__datafusion_extracted_1,
b.__datafusion_extracted_2, b.id.
```
Either ingredient alone (join key extraction without the `WHERE`, or the
`WHERE` with a plain join key) works fine.
Workaround (confirmed for all variants):
```sql
SET datafusion.optimizer.enable_leaf_expression_pushdown = false;
```
### Expected behavior
The queries should run; with leaf expression pushdown disabled they return
the expected rows.
### Additional context
Plan progression from `EXPLAIN VERBOSE` for the self-join variant:
`extract_leaf_expressions` rewrites the join to `Inner Join:
__datafusion_extracted_1 = __datafusion_extracted_2` with per-input extraction
projections and a positional recovery projection above the join
(`build_recovery_projection`) that restores `a.attributes, a.id, a.name,
b.attributes, b.id, b.name`. After `push_down_leaf_projections` merges the
extraction projections downward, `optimize_projections` prunes the join inputs
and drops columns that the recovery projection still references. The surviving
field set looks index-shifted rather than name-driven — note `b.id` survives
while `b.name` (the column actually referenced above) vanishes — which suggests
the pruning mis-accounts for the extra extraction columns when computing
required indices for the join inputs.
We also see a related error family in production from the same machinery,
e.g. `Optimizer rule 'push_down_leaf_projections' failed ... Schema contains
qualified field name m.__datafusion_extracted_11 and unqualified field name
__datafusion_extracted_11 which would be ambiguous`, suggesting qualifier
handling of the generated aliases is fragile in multi-input plans.
Related fixed issues in this area, for context — this appears to be a
distinct remaining bug: #20432 (alias collision with user-provided
`__datafusion_extracted` names), #22615 / #22620 (`PushDownLeafProjections` vs
`Unnest`).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]