[
https://issues.apache.org/jira/browse/HIVE-14427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427526#comment-15427526
]
Barna Zsombor Klara commented on HIVE-14427:
--------------------------------------------
Hi [~ekoifman]
are you currently working on this, or could I take a look at it?
Thanks,
Zsombor
> CompactionTxnHandler.markCleaned() can delete aborted txns
> ----------------------------------------------------------
>
> Key: HIVE-14427
> URL: https://issues.apache.org/jira/browse/HIVE-14427
> Project: Hive
> Issue Type: Improvement
> Components: Transactions
> Reporter: Eugene Koifman
>
> We can modify
> {noformat}
> s = "select distinct txn_id from TXNS, TXN_COMPONENTS where txn_id = tc_txnid
> and txn_state = '" +
> TXN_ABORTED + "' and tc_database = '" + info.dbname + "' and
> tc_table = '" +
> info.tableName + "'" + (info.highestTxnId == 0 ? "" : " and txn_id
> <= " + info.highestTxnId);
> {noformat}
> to use select txn_id, count(*) ... group by txn_id so that we know the number
> of components in a TXN.
> Then when running "delete from TXN_COMPONENTS where..." we know how many rows
> were deleted.
> If the sum of all values from 1st query matched total number of rows deleted,
> we know that all Aborted txns in this set are empty and thus can be deleted
> here.
> This means we clean up aborted txns from TXNS table quicker and avoid a large
> join in _cleanEmptyAbortedTxns()_. Also, doing delete on TXNS here will have
> PKs in WHERE clause so it should be cheap.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)