[
https://issues.apache.org/jira/browse/HIVE-28787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Riju Trivedi updated HIVE-28787:
--------------------------------
Description:
The current cleanEmptyAbortedTxn() query
{code:java}
DELETE FROM "TXNS" WHERE "TXN_ID" NOT IN (SELECT "TC_TXNID" FROM
"TXN_COMPONENTS") AND "TXN_STATE" = 'a';{code}
is inefficient especially in PostgreSQL, leading to very high execution times
when entries get piled up.
Can we re-write this query using LEFT JOIN or WHERE EXISTS for better
performance?
{code:java}
DELETE FROM "TXNS"
WHERE NOT EXISTS (
SELECT 1 FROM "TXN_COMPONENTS"
WHERE "TXN_COMPONENTS"."TC_TXNID" = "TXNS"."TXN_ID"
)
AND "TXN_STATE" = 'a'; {code}
was:
The current cleanEmptyAbortedTxn() query
{code:java}
DELETE FROM "TXNS" WHERE "TXN_ID" NOT IN (SELECT "TC_TXNID" FROM
"TXN_COMPONENTS") AND "TXN_STATE" = 'a';{code}
is inefficient especially in PostgreSQL, leading to very high execution times
when entries get piled up.
Can we re-write this query using LEFT JOIN or WHERE EXISTS for better
performance?
> Improve/optimize Delete failed txns Sql
> ----------------------------------------
>
> Key: HIVE-28787
> URL: https://issues.apache.org/jira/browse/HIVE-28787
> Project: Hive
> Issue Type: Improvement
> Reporter: Riju Trivedi
> Priority: Major
>
> The current cleanEmptyAbortedTxn() query
> {code:java}
> DELETE FROM "TXNS" WHERE "TXN_ID" NOT IN (SELECT "TC_TXNID" FROM
> "TXN_COMPONENTS") AND "TXN_STATE" = 'a';{code}
> is inefficient especially in PostgreSQL, leading to very high execution times
> when entries get piled up.
> Can we re-write this query using LEFT JOIN or WHERE EXISTS for better
> performance?
> {code:java}
> DELETE FROM "TXNS"
> WHERE NOT EXISTS (
> SELECT 1 FROM "TXN_COMPONENTS"
> WHERE "TXN_COMPONENTS"."TC_TXNID" = "TXNS"."TXN_ID"
> )
> AND "TXN_STATE" = 'a'; {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)