2010YOUY01 commented on PR #18321: URL: https://github.com/apache/datafusion/pull/18321#issuecomment-3459671363
There are many test fixes due to the `row_groups_matched_bloom_filter`'s behavior change, I think the updated behavior is more reasonable. I have also verified the test changes are expected. The old behavior is, if bloom filter is not available, the 'matched' count will be 0, this PR changed `matched` count to total remaining row group count to further scan, so the metrics will be displayed as follow: (Parquet scanner will first check statistics, next bloom filters for row group pruning) - Case 1: both stat and bf have successfully pruned row groups: row_groups_pruned_statistics=10 total -> 7 matched, row_group_pruned_bloom_filter=7 total -> 3 matched - Case 2: stat has successfully pruned row groups, bloom filter is not available row_groups_pruned_statistics=10 total -> 7 matched, row_group_pruned_bloom_filter=7 total -> 7 matched -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
