alamb opened a new issue, #14936: URL: https://github.com/apache/datafusion/issues/14936
### Describe the bug As @blaginin found in https://github.com/apache/datafusion/pull/14685, the statistics when a File is projected (aka only a subset of the columns are present) is incorrect Specifically, the projected statistics have the same `total_byte_size` as the input. However, given only a subset of columns are selected this will mean that the `total_byte_size` should actually be lower ### To Reproduce See tests referenced in https://github.com/apache/datafusion/pull/14685 ### Expected behavior `total_byte_size` should take into account the subset of columns ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org