Eugene Koifman created HIVE-14427: ------------------------------------- Summary: CompactionTxnHandler.markCleaned() can delete aborted txns Key: HIVE-14427 URL: https://issues.apache.org/jira/browse/HIVE-14427 Project: Hive Issue Type: Improvement Components: Transactions Reporter: Eugene Koifman
We can modify {noformat} s = "select distinct txn_id from TXNS, TXN_COMPONENTS where txn_id = tc_txnid and txn_state = '" + TXN_ABORTED + "' and tc_database = '" + info.dbname + "' and tc_table = '" + info.tableName + "'" + (info.highestTxnId == 0 ? "" : " and txn_id <= " + info.highestTxnId); {noformat} to use select txn_id, count(*) ... group by txn_id so that we know the number of components in a TXN. Then when running "delete from TXN_COMPONENTS where..." we know how many rows were deleted. If the sum of all values from 1st query matched total number of rows deleted, we know that all Aborted txns in this set are empty and thus can be deleted here. This means we clean up aborted txns from TXNS table quicker and avoid a large join in _cleanEmptyAbortedTxns()_. Also, doing delete on TXNS here will have PKs in WHERE clause so it should be cheap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)