SourabhBadhya commented on code in PR #4313:
URL: https://github.com/apache/hive/pull/4313#discussion_r1201647967


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java:
##########
@@ -472,23 +473,38 @@ public List<CompactionInfo> findReadyToClean(long 
minOpenTxnWaterMark, long rete
 
   @Override
   @RetrySemantics.ReadOnly
-  public List<CompactionInfo> findReadyToCleanAborts(long 
abortedTimeThreshold, int abortedThreshold) throws MetaException {
+  public List<CompactionInfo> findReadyToCleanAborts(long 
abortedTimeThreshold, int abortedThreshold, long retentionTime) throws 
MetaException {
     try {
       List<CompactionInfo> readyToCleanAborts = new ArrayList<>();
       try (Connection dbConn = 
getDbConn(Connection.TRANSACTION_READ_COMMITTED, connPoolCompaction);
            Statement stmt = dbConn.createStatement()) {
         boolean checkAbortedTimeThreshold = abortedTimeThreshold >= 0;
-        String sCheckAborted = "SELECT \"tc\".\"TC_DATABASE\", 
\"tc\".\"TC_TABLE\", \"tc\".\"TC_PARTITION\", " +
-            " \"tc\".\"MIN_TXN_START_TIME\", \"tc\".\"ABORTED_TXN_COUNT\", 
\"minOpenWriteTxnId\".\"MIN_OPEN_WRITE_TXNID\" FROM " +
-            " ( SELECT \"TC_DATABASE\", \"TC_TABLE\", \"TC_PARTITION\", " +
-            " MIN(\"TXN_STARTED\") AS \"MIN_TXN_START_TIME\", COUNT(*) AS 
\"ABORTED_TXN_COUNT\" FROM \"TXNS\", \"TXN_COMPONENTS\" " +
-            " WHERE \"TXN_ID\" = \"TC_TXNID\" AND \"TXN_STATE\" = " + 
TxnStatus.ABORTED +
-            " GROUP BY \"TC_DATABASE\", \"TC_TABLE\", \"TC_PARTITION\" " +
-            (checkAbortedTimeThreshold ? "" : " HAVING COUNT(*) > " + 
abortedThreshold) + " ) \"tc\" " +
-            " LEFT JOIN ( SELECT MIN(\"TC_TXNID\") AS 
\"MIN_OPEN_WRITE_TXNID\", \"TC_DATABASE\", \"TC_TABLE\", \"TC_PARTITION\" FROM 
\"TXNS\", \"TXN_COMPONENTS\" " +
-            " WHERE \"TXN_ID\" = \"TC_TXNID\" AND \"TXN_STATE\"=" + 
TxnStatus.OPEN + " GROUP BY \"TC_DATABASE\", \"TC_TABLE\", \"TC_PARTITION\" ) 
\"minOpenWriteTxnId\" " +
-            " ON \"tc\".\"TC_DATABASE\" = 
\"minOpenWriteTxnId\".\"TC_DATABASE\" AND \"tc\".\"TC_TABLE\" = 
\"minOpenWriteTxnId\".\"TC_TABLE\"" +
-            " AND (\"tc\".\"TC_PARTITION\" = 
\"minOpenWriteTxnId\".\"TC_PARTITION\" OR (\"tc\".\"TC_PARTITION\" IS NULL AND 
\"minOpenWriteTxnId\".\"TC_PARTITION\" IS NULL))";
+        String firstInnerQuery = "SELECT \"tc\".\"TC_DATABASE\" AS \"DB\", 
\"tc\".\"TC_TABLE\" AS \"TBL\", \"tc\".\"TC_PARTITION\" AS \"PART\", " +

Review Comment:
   @deniskuzZ There is much benefit of using a separate table here. The 
benefits are here - 
   1. Separates the flow of metadata. We also eliminate the chance of breaking 
the compaction/abort cleanup when modifying metadata of abort 
cleanup/compaction. 
   2. Easier debugging in case of failures.
   
   I have personally not done much perf measurements with the query. The query 
is a UNION ALL on three queries. The second & third inner query is supposed to 
be not so costly since it will retrieve usually 1 record per table/partition if 
retry records exist for such table/partition.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to