yashrb24 opened a new pull request, #21247: URL: https://github.com/apache/datafusion/pull/21247
## Which issue does this PR close? - Closes #21246 ## Rationale for this change `ProjectionExec::gather_filters_for_pushdown` silently rewrites filter predicates to the wrong source column when the output schema contains duplicate column names — a structure that arises above joins where both sides share a column name. Two functions use name-only schema lookups (`column_with_name` and `index_of`) that always return the first match, which is incorrect when duplicate names exist: 1. `collect_reverse_alias` — HashMap key collision causes the second duplicate to overwrite the first. 2. `FilterRemapper::try_remap` — `index_of` silently rewrites column indices from non-first duplicates to position 0. This code path is not exercised through normal SQL because the logical optimizer's `PushDownFilter` resolves qualified column references and pushes filters below projections before the physical plan is created. However, it affects any direct construction of physical plans (custom planners, external systems, the DataFrame API with manual projections). ## What changes are included in this PR? 1. **`collect_reverse_alias`**: Use `enumerate()` index instead of `column_with_name()`. Projection expressions are positionally aligned with the output schema, so `idx` is the correct output column index. 2. **`gather_filters_for_pushdown`**: Replace `FilterRemapper::try_remap` (which uses `index_of`) with direct validation against the alias map's exact `(name, index)` keys. The `PhysicalColumnRewriter` already does an exact-key lookup, so `try_remap` was both redundant and wrong for this case. ## Are these changes tested? Yes. A regression test is added that constructs the exact physical plan structure triggering the bug (FilterExec → ProjectionExec with duplicate column names → HashJoinExec), runs the FilterPushdown optimizer, and verifies the optimized plan returns correct results (3 rows instead of the previous 0). ## Are there any user-facing changes? No API changes. Fixes incorrect query results for physical plans with duplicate column names in projections. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
