InvisibleProgrammer commented on code in PR #5567:
URL: https://github.com/apache/hive/pull/5567#discussion_r1867281148
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java:
##########
@@ -10296,18 +10297,23 @@ public Map<String, String>
updateTableColumnStatistics(ColumnStatistics colStats
}
// TODO: (HIVE-20109) ideally the col stats stats should be in colstats,
not in the table!
- // Set the table properties
- // No need to check again if it exists.
- String dbname = table.getDbName();
- String name = table.getTableName();
- MTable oldt = mTable;
Map<String, String> newParams = new HashMap<>(table.getParameters());
- StatsSetupConst.setColumnStatsState(newParams, colNames);
- boolean isTxn = TxnUtils.isTransactionalTable(oldt.getParameters());
- if (isTxn) {
- if (!areTxnStatsSupported) {
- StatsSetupConst.setBasicStatsState(newParams, StatsSetupConst.FALSE);
- } else {
+
+ int retries = 3;
+ boolean success = false;
+ while (!success && retries > 0) {
Review Comment:
No.
```Summary:
updateTableColumnStatistics can throw
SQLIntegrityConstraintViolationException during replication if HA is on and two
different HMS instance gets the same call but with different engine.
Workaround:
Update table column statistics in single threaded.
Details:
updateTableColumnStatistics has a relative long running transaction. In that
transaction, it validates the actual parameters, queries the metastore db
against the TABLE_PARAMS that are already stored and makes a decision based on
that. After this, it uses data nucleus to persist the new statistics.
From the two HMS instances, one can save the column statistics. And the
other cannot as the first instance already saved them.
```
The point is that both process A and process B decides to store the new
data. On db level, it is an insert. Process A commits the insert. Process B
fails with constraint violation as it is already exists. If we retry process B,
it queries the current state of the statistics again so now it will make a
decision to do update, instead of insert.
Unfortunately, DataNucleus doesn't know such a thing like upsert - it would
be way easier in that way...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]