crepererum opened a new issue, #5614:
URL: https://github.com/apache/arrow-datafusion/issues/5614

   A `ParquetExec` is created from a `FileScanConfig` and an optional filter 
predicate[^size_hint]. These two are different, independent parameters -- at 
least the documentation is not implying that the predicate should be considered 
when constructing the `FileScanConfig`. Now the statistics for the 
`ParquetExec` are calculated by `FileScanConfig::project`:
   
   
https://github.com/apache/arrow-datafusion/blob/0f6931caa6f8b48e116a8e77e989c404f31f3f8d/datafusion/core/src/physical_plan/file_format/mod.rs#L213-L219
   
   This forwards `is_exact` from the input which might have been set to `true`. 
However there is a predicate, `is_exact` should likely be `false` because some 
data may be removed which will mess up the exact statistic. So either the 
forwarding is wrong (at least when a predicate is given) or the docs are 
imprecise.
   
   Note that this is unrelated to #5613 because this issue here is about the 
`is_exact=true` case.
   
   [^size_hint]: And a metadata size hint, but this is irrelevant here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to