[
https://issues.apache.org/jira/browse/HIVE-27198?focusedWorklogId=859969&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-859969
]
ASF GitHub Bot logged work on HIVE-27198:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 02/May/23 06:33
Start Date: 02/May/23 06:33
Worklog Time Spent: 10m
Work Description: maheshrajus commented on code in PR #4174:
URL: https://github.com/apache/hive/pull/4174#discussion_r1182136243
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java:
##########
@@ -888,35 +887,13 @@ public void cleanEmptyAbortedAndCommittedTxns() throws
MetaException {
*/
long lowWaterMark = getOpenTxnTimeoutLowBoundaryTxnId(dbConn);
- String s = "SELECT \"TXN_ID\" FROM \"TXNS\" WHERE " +
+ String s = "DELETE FROM \"TXNS\" WHERE " +
Review Comment:
hi @deniskuzZ, the query is not hardcoded fully.
we have some txn state fields in middile of the query.
TxnStatus.ABORTED, and TxnStatus.COMMITTED. So better to keep like this for
better readable purpose.
Issue Time Tracking
-------------------
Worklog Id: (was: 859969)
Time Spent: 1h 40m (was: 1.5h)
> Delete directly aborted transactions instead of select and loading ids
> ----------------------------------------------------------------------
>
> Key: HIVE-27198
> URL: https://issues.apache.org/jira/browse/HIVE-27198
> Project: Hive
> Issue Type: Improvement
> Reporter: Mahesh Raju Somalaraju
> Assignee: Mahesh Raju Somalaraju
> Priority: Major
> Labels: pull-request-available
> Time Spent: 1h 40m
> Remaining Estimate: 0h
>
> in cleaning the aborted transaction , we can directly deletes the txns
> instead of selecting and process.
> method name:
> cleanEmptyAbortedAndCommittedTxns
> Code:
> String s = "SELECT \"TXN_ID\" FROM \"TXNS\" WHERE " +
> "\"TXN_ID\" NOT IN (SELECT \"TC_TXNID\" FROM \"TXN_COMPONENTS\") AND " +
> " (\"TXN_STATE\" = " + TxnStatus.ABORTED + " OR \"TXN_STATE\" = " +
> TxnStatus.COMMITTED + ") AND "
> + " \"TXN_ID\" < " + lowWaterMark;
>
> proposed code:
> String s = "DELETE \"TXN_ID\" FROM \"TXNS\" WHERE " +
> "\"TXN_ID\" NOT IN (SELECT \"TC_TXNID\" FROM \"TXN_COMPONENTS\") AND " +
> " (\"TXN_STATE\" = " + TxnStatus.ABORTED + " OR \"TXN_STATE\" = " +
> TxnStatus.COMMITTED + ") AND "
> + " \"TXN_ID\" < " + lowWaterMark;
>
> the select needs to be eliminated and the delete should work with the where
> clause instead of the built in clause
> we can see no reason for loading the ids into memory and then generate a huge
> sql
>
> Bathcing is also not necessary here, we can deletes the records directly
--
This message was sent by Atlassian Jira
(v8.20.10#820010)