okumin opened a new pull request, #5607:
URL: https://github.com/apache/hive/pull/5607

   ### What changes were proposed in this pull request?
   
   Let `HiveIcebergStorageHandler` respect time travel conditions on fetching 
column statistics in Puffin files.
   
   ### Why are the changes needed?
   
   Currently, `HiveIcebergStorageHandler#canProvideColStatistics` and 
`HiveIcebergStorageHandler#getColStatistics` retrieve statistics based on the 
current snapshot id while `HiveIcebergStorageHandler#getBasicStatistics` 
respects conditions of time travel features. It causes column statistics to be 
inconsistent with basic stats. For example, column stats can say there are 100 
null values even though basic stats say the number of total rows is only 10. 
Hive can get confused when it builds an execution plan.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. Users will see better execution plans when they submit time travel 
queries.
   
   ### Is the change a dependency upgrade?
   
   No.
   
   ### How was this patch tested?
   
   I added a new qtest. [The first 
commit](https://github.com/apache/hive/commit/ee04e72f84006e81db7437b72a8a9952d1373ac5)
 includes a `q.out` reproducing the problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to