okumin opened a new pull request, #5607: URL: https://github.com/apache/hive/pull/5607
### What changes were proposed in this pull request? Let `HiveIcebergStorageHandler` respect time travel conditions on fetching column statistics in Puffin files. ### Why are the changes needed? Currently, `HiveIcebergStorageHandler#canProvideColStatistics` and `HiveIcebergStorageHandler#getColStatistics` retrieve statistics based on the current snapshot id while `HiveIcebergStorageHandler#getBasicStatistics` respects conditions of time travel features. It causes column statistics to be inconsistent with basic stats. For example, column stats can say there are 100 null values even though basic stats say the number of total rows is only 10. Hive can get confused when it builds an execution plan. ### Does this PR introduce _any_ user-facing change? Yes. Users will see better execution plans when they submit time travel queries. ### Is the change a dependency upgrade? No. ### How was this patch tested? I added a new qtest. [The first commit](https://github.com/apache/hive/commit/ee04e72f84006e81db7437b72a8a9952d1373ac5) includes a `q.out` reproducing the problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org