zhangbutao commented on code in PR #5498:
URL: https://github.com/apache/hive/pull/5498#discussion_r1811819765
##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/StatsOptimizer.java:
##########
@@ -932,7 +932,7 @@ private Collection<List<ColumnStatisticsObj>>
verifyAndGetPartColumnStats(
private Long getRowCnt(
ParseContext pCtx, TableScanOperator tsOp, Table tbl) throws
HiveException {
Long rowCnt = 0L;
- if (tbl.isPartitioned()) {
+ if (tbl.isPartitioned() &&
StatsUtils.checkCanProvidePartitionStats(tbl)) {
for (Partition part : pctx.getPrunedPartitions(
tsOp.getConf().getAlias(), tsOp).getPartitions()) {
if
(!StatsUtils.areBasicStatsUptoDateForQueryAnswering(part.getTable(),
part.getParameters())) {
Review Comment:
`StatsUtils::areBasicStatsUptoDateForQueryAnswering` is not applicable to
Iceberg table, and it will check table param `COLUMN_STATS_ACCURATE` and then
determine to get stats or not. But we always get partition stats from iceberg
metadata file, so `COLUMN_STATS_ACCURATE` should be always `true`.
The reason the iceberg qtest for table&partition's stats looks good is
because we already set `COLUMN_STATS_ACCURATE` to true in hive-site.xml. But in
fact, i think no users will care this param. So i think if we want to use
iceberg partition stats, we should consider to remove this param.
https://github.com/apache/hive/blob/48a67a4f2cc7a65bf9aac4a1ed518958c5b00027/data/conf/hive-site.xml#L334-L339
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]