maheshk114 commented on a change in pull request #1834:
URL: https://github.com/apache/hive/pull/1834#discussion_r558397066
##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java
##########
@@ -204,6 +206,54 @@ public int persistColumnStats(Hive db, Table tbl) throws
HiveException, MetaExce
public void setDpPartSpecs(Collection<Partition> dpPartSpecs) {
}
+ public static boolean canSkipStatsGeneration(String dbName, String tblName,
String partName,
+ long statsWriteId, String
queryValidWriteIdList) {
+ if (queryValidWriteIdList != null) { // Can be null if its not an ACID
table.
+ ValidWriteIdList validWriteIdList = new
ValidReaderWriteIdList(queryValidWriteIdList);
+ // Just check if the write ID is valid. If it's valid (i.e. we are
allowed to see it),
+ // that means it cannot possibly be a concurrent write. As stats
optimization is enabled
+ // only in case auto gather is enabled. Thus the stats must be updated
by a valid committed
+ // transaction and stats generation can be skipped.
+ if (validWriteIdList.isWriteIdValid(statsWriteId)) {
+ try {
+ IMetaStoreClient msc = Hive.get().getMSC();
+ TxnState state = msc.findStatStatusByWriteId(dbName, tblName,
partName, statsWriteId);
Review comment:
There is 2 cases where we need to compute stats again
1. If the table is updated by a txn with auto gather stats set to false. In
that case, we can find a txn with id greater than the stats write id (txn) in
the completed txn table. That is done in findStatStatusByWriteId method.
2. If the table stats is updated by an aborted txn. That can be. checked
using valid write id list. But if the compaction has cleaned up the txn info,
we can not judge that. So we have to check the completed txn table.That is done
in findStatStatusByWriteId method.
If the stat is updated by a read only txn (that is stats updater), then also
we need not compute the stats again. The info about read only txns is not
present in the valid write id list. That is done in findStatStatusByWriteId
method.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]