zhangbutao commented on code in PR #5498: URL: https://github.com/apache/hive/pull/5498#discussion_r1811819765
########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/StatsOptimizer.java: ########## @@ -932,7 +932,7 @@ private Collection<List<ColumnStatisticsObj>> verifyAndGetPartColumnStats( private Long getRowCnt( ParseContext pCtx, TableScanOperator tsOp, Table tbl) throws HiveException { Long rowCnt = 0L; - if (tbl.isPartitioned()) { + if (tbl.isPartitioned() && StatsUtils.checkCanProvidePartitionStats(tbl)) { for (Partition part : pctx.getPrunedPartitions( tsOp.getConf().getAlias(), tsOp).getPartitions()) { if (!StatsUtils.areBasicStatsUptoDateForQueryAnswering(part.getTable(), part.getParameters())) { Review Comment: `StatsUtils::areBasicStatsUptoDateForQueryAnswering` is not applicable to Iceberg table, and it will check table param `COLUMN_STATS_ACCURATE` and then determine to get stats or not. But we always get partition stats from iceberg metadata file, so `COLUMN_STATS_ACCURATE` should be always `true`. The reason the iceberg qtest for table&partition's stats looks good is because we already set `COLUMN_STATS_ACCURATE` to true in hive-site.xml. But in fact, i think no users will care this param. So i think if we want to use iceberg partition stats, we should consider to remove this param. https://github.com/apache/hive/blob/48a67a4f2cc7a65bf9aac4a1ed518958c5b00027/data/conf/hive-site.xml#L334-L339 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org