crm26 commented on PR #21363:
URL: https://github.com/apache/datafusion/pull/21363#issuecomment-4525879360
Thanks for the review and the conflict-resolution ask. The rebase went
smoothly except for one file:
`datafusion/optimizer/src/decorrelate_predicate_subquery.rs`. Main has
refactored the Filter handling — flattening it out of the `match plan` and
adding the `original_schema` re-project from #21265. Our PR adds a second arm
to that same `match` for `LogicalPlan::Projection`, so I needed the `match`
structure back. I restored it and moved the `original_schema` re-project inside
the Filter arm, so we still inherit #21265's fix for that path.
With that done, the test suite is mostly green, but one SLT case now fails:
the two-InSubquery-in-one-SELECT example at `predicates.slt:846`:
```sql
WITH empty AS (SELECT 10 WHERE false)
SELECT
NULL IN (SELECT * FROM empty),
NULL NOT IN (SELECT * FROM empty)
FROM (SELECT 1) t;
```
The failure mode is the same one #21265 addressed for the Filter path: each
`rewrite_inner_subqueries` call builds a join producing a column named `mark`,
and two of those in a single Projection collide. `optimize_projections` rejects
the plan with `Schema contains duplicate unqualified field name mark`.
When I last validated this PR in April, that case passed. It no longer does,
because the duplicate-name check in `optimize_projections` was tightened in May
by #22389 and #21726. Our PR has not changed; the optimizer contract has.
The fix from #21265 — re-projecting the Filter output back to the original
schema width to strip the mark columns — does not transfer cleanly to our
Projection arm. In our case the final Projection's expressions reference those
mark columns directly to form the rewritten predicates, so stripping the
columns would invalidate the expressions. Disambiguating the marks (aliasing
each one uniquely as `rewrite_inner_subqueries` produces it, or rewriting them
at the Projection-arm boundary before we build the final plan) seems to be the
right shape of fix.
My question for you: should that disambiguation land in this PR, or would
you rather merge what's here once I've reverted `predicates.slt:846` to its
prior `query error` form, and handle the multi-InSubquery case as a follow-up
that mirrors what #21265 did for Filter?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]