[ 
https://issues.apache.org/jira/browse/HIVE-14427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435409#comment-15435409
 ] 

Barna Zsombor Klara commented on HIVE-14427:
--------------------------------------------

Here is my take at it:
https://reviews.apache.org/r/51381/

> CompactionTxnHandler.markCleaned() can delete aborted txns
> ----------------------------------------------------------
>
>                 Key: HIVE-14427
>                 URL: https://issues.apache.org/jira/browse/HIVE-14427
>             Project: Hive
>          Issue Type: Improvement
>          Components: Transactions
>            Reporter: Eugene Koifman
>
> We can modify 
> {noformat}
> s = "select distinct txn_id from TXNS, TXN_COMPONENTS where txn_id = tc_txnid 
> and txn_state = '" +
>           TXN_ABORTED + "' and tc_database = '" + info.dbname + "' and 
> tc_table = '" +
>           info.tableName + "'" + (info.highestTxnId == 0 ? "" : " and txn_id 
> <= " + info.highestTxnId);
> {noformat}
> to use select txn_id, count(*) ... group by txn_id so that we know the number 
> of components in a TXN.
> Then when running "delete from TXN_COMPONENTS where..." we know how many rows 
> were deleted.
> If the sum of all values from 1st query matched total number of rows deleted, 
> we know that all Aborted txns in this set are empty and thus can be deleted 
> here.
> This means we clean up aborted txns from TXNS table quicker and avoid a large 
> join in _cleanEmptyAbortedTxns()_.  Also, doing delete on TXNS here will have 
> PKs in WHERE clause so it should be cheap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to