[ 
https://issues.apache.org/jira/browse/HIVE-28787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17939275#comment-17939275
 ] 

Venugopal Reddy K edited comment on HIVE-28787 at 3/28/25 5:40 PM:
-------------------------------------------------------------------

[~rtrivedi12] Just to add

Current method name is:

org.apache.hadoop.hive.metastore.txn.CompactionTxnHandler#cleanEmptyAbortedAndCommittedTxns().

And the current query is:

DELETE FROM "TXNS" WHERE "TXN_ID" NOT IN (SELECT "TC_TXNID" FROM 
"TXN_COMPONENTS") AND ("TXN_STATE" = 'a' OR "TXN_STATE" = 'c') AND "TXN_ID" < 
lowWaterMark

 

 


was (Author: venureddy):
[~rtrivedi12] Just to addCurrent method name is:

org.apache.hadoop.hive.metastore.txn.CompactionTxnHandler#cleanEmptyAbortedAndCommittedTxns().

And the current query is:

DELETE FROM "TXNS" WHERE "TXN_ID" NOT IN (SELECT "TC_TXNID" FROM 
"TXN_COMPONENTS") AND ("TXN_STATE" = 'a' OR "TXN_STATE" = 'c') AND "TXN_ID" < 
lowWaterMark

 

 

> Improve/optimize Delete failed txns Sql 
> ----------------------------------------
>
>                 Key: HIVE-28787
>                 URL: https://issues.apache.org/jira/browse/HIVE-28787
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Riju Trivedi
>            Assignee: Riju Trivedi
>            Priority: Major
>
> The current cleanEmptyAbortedTxn() query
> {code:java}
> DELETE FROM "TXNS" WHERE "TXN_ID" NOT IN (SELECT "TC_TXNID" FROM 
> "TXN_COMPONENTS") AND "TXN_STATE" = 'a';{code}
> is inefficient especially in PostgreSQL, leading to very high execution times 
> when entries get piled up.
> Can we re-write this query using LEFT JOIN or WHERE EXISTS for better 
> performance?
> {code:java}
> DELETE FROM "TXNS"
> WHERE NOT EXISTS (
>     SELECT 1 FROM "TXN_COMPONENTS" 
>     WHERE "TXN_COMPONENTS"."TC_TXNID" = "TXNS"."TXN_ID"
> )
> AND "TXN_STATE" = 'a'; {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to