pvary commented on a change in pull request #2547:
URL: https://github.com/apache/hive/pull/2547#discussion_r707165193
##########
File path:
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
##########
@@ -531,7 +531,7 @@ public void cleanTxnToWriteIdTable() throws MetaException {
String s = "SELECT MIN(\"RES\".\"ID\") AS \"ID\" FROM (" +
"SELECT MAX(\"TXN_ID\") + 1 AS \"ID\" FROM \"TXNS\" " +
"UNION " +
- "SELECT MIN(\"WS_COMMIT_ID\") AS \"ID\" FROM \"WRITE_SET\" " +
+ "SELECT MIN(\"WS_TXNID\") AS \"ID\" FROM \"WRITE_SET\" " +
Review comment:
So it think the theory is this:
- Original: Minimum open transactionId which was open when any of the
currently executing transactions started.
- New: Minimum open transactionId which was open when the last write
committed.
So originally we removed lines where the transaction older of any
open/aborted transactions, and even the running transactions were not needed to
see the older changes, but this was based on the starting time of the
transaction.
With the new change we are only keeping the `WRITE_SET` based on the
transactionId when the actual write was committed (not when the actual write
was started), as the previous queries should not read the non-committed
folder/data anyway
Question: What happens with the `WRITE_SET` when a compaction happened on
this table (we are relying here on the fact that we are the only ones removing
lines from this table)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]