crepererum opened a new issue, #5614: URL: https://github.com/apache/arrow-datafusion/issues/5614
A `ParquetExec` is created from a `FileScanConfig` and an optional filter predicate[^size_hint]. These two are different, independent parameters -- at least the documentation is not implying that the predicate should be considered when constructing the `FileScanConfig`. Now the statistics for the `ParquetExec` are calculated by `FileScanConfig::project`: https://github.com/apache/arrow-datafusion/blob/0f6931caa6f8b48e116a8e77e989c404f31f3f8d/datafusion/core/src/physical_plan/file_format/mod.rs#L213-L219 This forwards `is_exact` from the input which might have been set to `true`. However there is a predicate, `is_exact` should likely be `false` because some data may be removed which will mess up the exact statistic. So either the forwarding is wrong (at least when a predicate is given) or the docs are imprecise. Note that this is unrelated to #5613 because this issue here is about the `is_exact=true` case. [^size_hint]: And a metadata size hint, but this is irrelevant here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
