kishendas commented on a change in pull request #1186:
URL: https://github.com/apache/hive/pull/1186#discussion_r447465847
##########
File path:
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##########
@@ -2264,6 +2255,8 @@ private MergedColumnStatsForPartitions
mergeColStatsForPartitions(String catName
if (colStatsMap.size() < 1) {
LOG.debug("No stats data found for: dbName={} tblName= {} partNames=
{} colNames= ", dbName, tblName, partNames,
colNames);
+ // TODO: If we don't find any stats then most likely we should return
null. Returning an empty object will not
+ // trigger the lookup in the raw store and we will end up with missing
stats.
Review comment:
Please create a JIRA for this TODO and add a reference in the comment.
##########
File path:
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##########
@@ -851,8 +851,6 @@ private void updateTableColStats(RawStore rawStore, String
catName, String dbNam
sharedCache.refreshTableColStatsInCache(StringUtils.normalizeIdentifier(catName),
StringUtils.normalizeIdentifier(dbName),
StringUtils.normalizeIdentifier(tblName),
tableColStats.getStatsObj());
- // Update the table to get consistent stats state.
- sharedCache.alterTableInCache(catName, dbName, tblName, table);
Review comment:
Sorry, I am bit confused looking at this diff. So, the original issue
seems to be - "Metastore's update service wrongly strips partition column stats
from the cache in an attempt to update them." . How are we fixing this issue by
not updating the stats in the cache ? Wouldn't the right fix for this is to to
ensure sharedCache.alterTableInCache and sharedCache.alterPartitionInCache
methods do the right thing and not incorrectly remove partition column stats ?
##########
File path:
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##########
@@ -900,13 +898,6 @@ private void updateTablePartitionColStats(RawStore
rawStore, String catName, Str
rawStore.getPartitionColumnStatistics(catName, dbName, tblName,
partNames, colNames, CacheUtils.HIVE_ENGINE);
Deadline.stopTimer();
sharedCache.refreshPartitionColStatsInCache(catName, dbName,
tblName, partitionColStats);
- Deadline.startTimer("getPartitionsByNames");
- List<Partition> parts = rawStore.getPartitionsByNames(catName,
dbName, tblName, partNames);
- Deadline.stopTimer();
- // Also save partitions for consistency as they have the stats state.
- for (Partition part : parts) {
- sharedCache.alterPartitionInCache(catName, dbName, tblName,
part.getValues(), part);
Review comment:
Same concern here as previous. How do we fix this issue by not updating
the cache at all ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]