mithuncy opened a new issue, #21168:
URL: https://github.com/apache/datafusion/issues/21168

   ## Summary
   
   DataFusion's child-filter remapping helpers appear to remap pushed-down 
columns by column name via `child_schema.index_of(col.name())`.
   
   That is unsafe for schemas with duplicate field names, because a filter on 
the second `id` column can be silently rebound to the first one.
   
   ## Affected APIs
   
   - `ChildFilterDescription::from_child(...)`
   - `ChildFilterDescription::from_child_with_allowed_indices(...)`
   
   Both appear to go through the same name-based remapping path.
   
   ## Why this is a problem
   
   Join outputs can legitimately contain duplicate field names, especially in 
self-joins or other unqualified physical schemas.
   
   If the child schema looks like:
   
   - `id@1`
   - `id@3`
   
   then a filter referencing `id@3` can be rewritten onto `id@1` when the 
remapper uses `index_of("id")`.
   
   ## Expected behavior
   
   The remapping should preserve physical column identity rather than only 
column name, or it should reject duplicate-name schemas instead of silently 
rebinding to the first match.
   
   ## Context
   
   This surfaced in ParadeDB while using:
   
   - `TantivyLookupExec` via `ChildFilterDescription::from_child(...)`
   - `VisibilityFilterExec` via 
`ChildFilterDescription::from_child_with_allowed_indices(...)`
   
   We worked around it locally with TODOs, but the underlying behavior looks 
upstream.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to