pvary commented on a change in pull request #2547:
URL: https://github.com/apache/hive/pull/2547#discussion_r707165193



##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
##########
@@ -531,7 +531,7 @@ public void cleanTxnToWriteIdTable() throws MetaException {
         String s = "SELECT MIN(\"RES\".\"ID\") AS \"ID\" FROM (" +
             "SELECT MAX(\"TXN_ID\") + 1 AS \"ID\" FROM \"TXNS\" " +
             "UNION " +
-            "SELECT MIN(\"WS_COMMIT_ID\") AS \"ID\" FROM \"WRITE_SET\" " +
+            "SELECT MIN(\"WS_TXNID\") AS \"ID\" FROM \"WRITE_SET\" " +

Review comment:
       So it think the theory is this:
   - Original: Minimum open transactionId which was open when any of the 
currently executing transactions started.
   - New: Minimum open transactionId which was open when the last write 
committed.
   
   So originally we removed lines where the transaction older of any 
open/aborted transactions, and even the running transactions were not needed to 
see the older changes, but this was based on the starting time of the 
transaction.
   
   With the new change we are only keeping the `WRITE_SET` based on the 
transactionId when the actual write was committed (not when the actual write 
was started), as the previous queries should not read the non-committed 
folder/data anyway
   
   Question: What happens with the `WRITE_SET` when a compaction happened on 
this table (we are relying here on the fact that we are the only ones removing 
lines from this table)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to