[
https://issues.apache.org/jira/browse/HIVE-28787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17939275#comment-17939275
]
Venugopal Reddy K commented on HIVE-28787:
------------------------------------------
[~rtrivedi12] Just to addCurrent method name is:
org.apache.hadoop.hive.metastore.txn.CompactionTxnHandler#cleanEmptyAbortedAndCommittedTxns().
And the current query is:
DELETE FROM "TXNS" WHERE "TXN_ID" NOT IN (SELECT "TC_TXNID" FROM
"TXN_COMPONENTS") AND ("TXN_STATE" = 'a' OR "TXN_STATE" = 'c') AND "TXN_ID" <
lowWaterMark
> Improve/optimize Delete failed txns Sql
> ----------------------------------------
>
> Key: HIVE-28787
> URL: https://issues.apache.org/jira/browse/HIVE-28787
> Project: Hive
> Issue Type: Improvement
> Reporter: Riju Trivedi
> Assignee: Riju Trivedi
> Priority: Major
>
> The current cleanEmptyAbortedTxn() query
> {code:java}
> DELETE FROM "TXNS" WHERE "TXN_ID" NOT IN (SELECT "TC_TXNID" FROM
> "TXN_COMPONENTS") AND "TXN_STATE" = 'a';{code}
> is inefficient especially in PostgreSQL, leading to very high execution times
> when entries get piled up.
> Can we re-write this query using LEFT JOIN or WHERE EXISTS for better
> performance?
> {code:java}
> DELETE FROM "TXNS"
> WHERE NOT EXISTS (
> SELECT 1 FROM "TXN_COMPONENTS"
> WHERE "TXN_COMPONENTS"."TC_TXNID" = "TXNS"."TXN_ID"
> )
> AND "TXN_STATE" = 'a'; {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)