Tamar-Posen opened a new issue, #19764: URL: https://github.com/apache/datafusion/issues/19764
### Is your feature request related to a problem or challenge? Using DataFusion 51 with HashJoin dynamic filters enabled, the performance gains are significant. I’d like to extend this mechanism to leverage an inverted index, currently implemented in a custom TableProvider, for runtime pruning. Architectural conflict: Standard pruning occurs during logical/physical planning, but the values required for inverted index lookups (join keys - dynamic filter) are available only during the execution phase, after the build side of the hash join completes. ### Describe the solution you'd like Introduce a custom execution node (e.g. IndexPruningExec) that wraps the DataSourceExec on the probe side of the join: - Wait for dynamic filter values produced by HashJoinExec - Query the inverted index using those values - Dynamically prune files/row groups before executing the underlying DataSourceExec. Questions - Does this “wrapper exec” approach align with DataFusion’s long-term execution model? - Is there an existing extension point intended for this kind of late-binding runtime pruning? - Are there alternative designs (e.g., a pruning or predicate hook) that would be a better fit? ### Describe alternatives you've considered Physical Optimizer Rules: Not viable because they run before dynamic filter values are produced. Logical Sideways Information Passing: Too complex, duplicates work already done by the join's hash table, etc ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
