mithuncy opened a new issue, #21168:
URL: https://github.com/apache/datafusion/issues/21168
## Summary
DataFusion's child-filter remapping helpers appear to remap pushed-down
columns by column name via `child_schema.index_of(col.name())`.
That is unsafe for schemas with duplicate field names, because a filter on
the second `id` column can be silently rebound to the first one.
## Affected APIs
- `ChildFilterDescription::from_child(...)`
- `ChildFilterDescription::from_child_with_allowed_indices(...)`
Both appear to go through the same name-based remapping path.
## Why this is a problem
Join outputs can legitimately contain duplicate field names, especially in
self-joins or other unqualified physical schemas.
If the child schema looks like:
- `id@1`
- `id@3`
then a filter referencing `id@3` can be rewritten onto `id@1` when the
remapper uses `index_of("id")`.
## Expected behavior
The remapping should preserve physical column identity rather than only
column name, or it should reject duplicate-name schemas instead of silently
rebinding to the first match.
## Context
This surfaced in ParadeDB while using:
- `TantivyLookupExec` via `ChildFilterDescription::from_child(...)`
- `VisibilityFilterExec` via
`ChildFilterDescription::from_child_with_allowed_indices(...)`
We worked around it locally with TODOs, but the underlying behavior looks
upstream.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]