theirix opened a new issue, #16545: URL: https://github.com/apache/datafusion/issues/16545
### Describe the bug This is a follow-up to a discussion in https://github.com/apache/datafusion/pull/16325#issuecomment-2985522134, which is not directly related to table sampling but could affect it. I'd like to double-check if a volatile filter pushdown to a Parquet executor is expected. I had implemented the disabling of volatile pushdown filters for a logical plan in #13268. But it seems like the physical optimiser still pushes this predicate to an executor. Should we implement a similar mechanism to make volatile predicates as unsupported filters? In a current physical plan implementation, there is a concept of "unsupported" filters, which can be easily reused for it. Current behaviour: Before: ``` [2025-06-18T18:20:07Z TRACE datafusion::physical_planner] Optimized physical plan by LimitedDistinctAggregation: OutputRequirementExec ProjectionExec: expr=[count(Int64(1))@0 as count(*)] AggregateExec: mode=Final, gby=[], aggr=[count(Int64(1))] AggregateExec: mode=Partial, gby=[], aggr=[count(Int64(1))] FilterExec: random() < 0.1 DataSourceExec: file_groups={1 group: [[sample.parquet]]}, file_type=parquet ``` After: ``` [2025-06-18T18:20:07Z TRACE datafusion::physical_planner] Optimized physical plan by FilterPushdown: OutputRequirementExec ProjectionExec: expr=[count(Int64(1))@0 as count(*)] AggregateExec: mode=Final, gby=[], aggr=[count(Int64(1))] AggregateExec: mode=Partial, gby=[], aggr=[count(Int64(1))] DataSourceExec: file_groups={1 group: [[sample.parquet]]}, file_type=parquet, predicate=random() < 0.1 ``` ### To Reproduce ```sql set datafusion.execution.parquet.pushdown_filters=true; create external table data stored as parquet location 'sample.parquet'; SELECT count(*) FROM data WHERE random() < 0.1; ``` ### Expected behavior I expect the physical plan optimiser doesn't perform pushdown of volatile predicates. ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org