adriangb commented on issue #20324: URL: https://github.com/apache/datafusion/issues/20324#issuecomment-3905154744
I do get your point. TPCH / TPCDS will essentially not use late materialization off/ `RowFilter` because like you say all files are opened at once. > Because a disabled filter now always returns "true" it scans the column while no longer contributing to making the selection smaller I assume by disabled filter you mean the cases where completely discard a DynamicFilterPhysicalExpr? I don’t think those should evaluate to `true`, I think they should ideally be completely removed. https://github.com/apache/datafusion/pull/20160 does not do that (as you say it returns `true`; side note: I wonder if we can optimize all true / all false masks). I am trying to address that in https://github.com/apache/datafusion/pull/20363 which essentially implements the suggestion above of “ It might be possible to merge the efforts if we e.g. add PhysicalExpr::is_discardable_filter() -> bool or something, then the more general adaptive selectivity machinery can choose to discard the filter instead of just putting it last” -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
