kishendas commented on a change in pull request #1186: URL: https://github.com/apache/hive/pull/1186#discussion_r447465847
########## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java ########## @@ -2264,6 +2255,8 @@ private MergedColumnStatsForPartitions mergeColStatsForPartitions(String catName if (colStatsMap.size() < 1) { LOG.debug("No stats data found for: dbName={} tblName= {} partNames= {} colNames= ", dbName, tblName, partNames, colNames); + // TODO: If we don't find any stats then most likely we should return null. Returning an empty object will not + // trigger the lookup in the raw store and we will end up with missing stats. Review comment: Please create a JIRA for this TODO and add a reference in the comment. ########## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java ########## @@ -851,8 +851,6 @@ private void updateTableColStats(RawStore rawStore, String catName, String dbNam sharedCache.refreshTableColStatsInCache(StringUtils.normalizeIdentifier(catName), StringUtils.normalizeIdentifier(dbName), StringUtils.normalizeIdentifier(tblName), tableColStats.getStatsObj()); - // Update the table to get consistent stats state. - sharedCache.alterTableInCache(catName, dbName, tblName, table); Review comment: Sorry, I am bit confused looking at this diff. So, the original issue seems to be - "Metastore's update service wrongly strips partition column stats from the cache in an attempt to update them." . How are we fixing this issue by not updating the stats in the cache ? Wouldn't the right fix for this is to to ensure sharedCache.alterTableInCache and sharedCache.alterPartitionInCache methods do the right thing and not incorrectly remove partition column stats ? ########## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java ########## @@ -900,13 +898,6 @@ private void updateTablePartitionColStats(RawStore rawStore, String catName, Str rawStore.getPartitionColumnStatistics(catName, dbName, tblName, partNames, colNames, CacheUtils.HIVE_ENGINE); Deadline.stopTimer(); sharedCache.refreshPartitionColStatsInCache(catName, dbName, tblName, partitionColStats); - Deadline.startTimer("getPartitionsByNames"); - List<Partition> parts = rawStore.getPartitionsByNames(catName, dbName, tblName, partNames); - Deadline.stopTimer(); - // Also save partitions for consistency as they have the stats state. - for (Partition part : parts) { - sharedCache.alterPartitionInCache(catName, dbName, tblName, part.getValues(), part); Review comment: Same concern here as previous. How do we fix this issue by not updating the cache at all ? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org