SourabhBadhya commented on code in PR #4397:
URL: https://github.com/apache/hive/pull/4397#discussion_r1223771475


##########
ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java:
##########
@@ -220,8 +220,10 @@ public int persistColumnStats(Hive db, Table tbl) throws 
HiveException, MetaExce
       start = System. currentTimeMillis();
       if (tbl != null && tbl.isNonNative() && 
tbl.getStorageHandler().canSetColStatistics(tbl)) {
         tbl.getStorageHandler().setColStatistics(tbl, colStats);
+      } else {
+        // Set table or partition column statistics in metastore.
+        db.setPartitionColumnStatistics(request);
       }
-      db.setPartitionColumnStatistics(request);

Review Comment:
   @zhangbutao I agree with your point. However, storing stats in 2 places has 
its pros & cons - 
   Pros - 
   1. We can fallback to metastore by changing the config - 
`hive.iceberg.stats.source=metastore` if we are not able to get stats from 
Puffin files.
   
   Cons - 
   1. Any change in Puffin files by external clients is not visible to 
metastore.
   2. Performance effect of executing these metastore DB calls to store column 
stats.
   
   In the approach mentioned in the PR, if users want to use metastore to get 
stats if they are not able to get stats from Puffin, then set 
`hive.iceberg.stats.source=metastore` and execute `ANALYZE TABLE <tableName> 
COMPUTE STATISTICS FOR COLUMNS`. (This will have an overhead of one more 
ANALYZE query).
   
   I will leave it to the community to decide if its best to store stats in 2 
places or storing it in a single place is sufficient. If the community thinks 
that this it is best to store in 2 places, then I won't proceed further. 
Otherwise, I will continue with the patch.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to