alamb opened a new issue, #15689: URL: https://github.com/apache/datafusion/issues/15689
### Describe the bug As @xudong963 mentions in - https://github.com/xudong963/arrow-datafusion/pull/5#discussion_r2034641672. And also brought up again in - https://github.com/apache/datafusion/pull/15661 When table_schema is different from file_schema then the current statistics merging code will incorrectly merge statistics Specifically, it merges column statistics based on their ordinal position (order in the file) Currently this isn't a huge problem as the statistics are only used in a limited way for some optimizations, but as we start to rely on statistics for correctness, such as https://github.com/apache/datafusion/issues/6672 it is more important ### To Reproduce if we have two files * File 1: `(a int32, b int32)` * File 2: `(b int32, a int32)` I think the code on main will combine statistics for columns a in File 1 and column `b` in File 2 together. ### Expected behavior I expect that only statistics from the same logical column are merged together. ### Additional context After https://github.com/apache/datafusion/pull/15661 is merged, I suggest: 1. adding some function that knows how to map columns from a file schema --> table schema (filling in any missing columns with `ColumnStatistics::new_unnown`) before combining them 2. Adding testst Maybe we can simply reuse the existing [`SchemaMapper`](https://docs.rs/datafusion/latest/datafusion/datasource/schema_adapter/trait.SchemaMapper.html) / factory 🤔 so we are sure the statistics merging is consistent with runtime -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org