Tamar-Posen opened a new issue, #19764:
URL: https://github.com/apache/datafusion/issues/19764

   ### Is your feature request related to a problem or challenge?
   
   Using DataFusion 51 with HashJoin dynamic filters enabled, the performance 
gains are significant. 
   I’d like to extend this mechanism to leverage an inverted index, currently 
implemented in a custom TableProvider, for runtime pruning.
   
   Architectural conflict:
   Standard pruning occurs during logical/physical planning, but the values 
required for inverted index lookups (join keys - dynamic filter) are available 
only during the execution phase, after the build side of the hash join 
completes.
   
   ### Describe the solution you'd like
   
   Introduce a custom execution node (e.g. IndexPruningExec) that wraps the 
DataSourceExec on the probe side of the join:
   - Wait for dynamic filter values produced by HashJoinExec
   - Query the inverted index using those values
   - Dynamically prune files/row groups before executing the underlying 
DataSourceExec.
   
   Questions
   - Does this “wrapper exec” approach align with DataFusion’s long-term 
execution model?
   - Is there an existing extension point intended for this kind of 
late-binding runtime pruning?
   - Are there alternative designs (e.g., a pruning or predicate hook) that 
would be a better fit?
   
   ### Describe alternatives you've considered
   
   Physical Optimizer Rules: Not viable because they run before dynamic filter 
values are produced.
   Logical Sideways Information Passing: Too complex, duplicates work already 
done by the join's hash table, etc
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to