zhangbutao commented on code in PR #5400: URL: https://github.com/apache/hive/pull/5400#discussion_r1735677285
########## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ########## @@ -613,7 +616,7 @@ public boolean canComputeQueryUsingStats(org.apache.hadoop.hive.ql.metadata.Tabl } } } - return false; + return true; Review Comment: Good catch! In case of delete files, `analyze table compute stats` job can get the accurate stats as it will launch tez task to compute the stats. And after the job `analyze table compute stats`, the HMS stats will be updated & accurate and `iceberg.hive.keep.stats` will be true, so we can use the HMS stats to optimize the `count `query. But if the statsSource is Iceberg & in case of delete files, even we have done the job `analyze table compute stats`, we won't update the Iceberg `SnapshotSummary`, so we can not optimize the `count `query. This will look a little weird. Users do a job `analyze table compute stats` to update the stats, but they can not optimize the `count `query if the statsSource is Iceberg & in case of delete files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org