[ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855785&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855785
 ]

ASF GitHub Bot logged work on HIVE-27020:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 10/Apr/23 11:58
            Start Date: 10/Apr/23 11:58
    Worklog Time Spent: 10m 
      Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161664755


##########
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/CompactionCleaner.java:
##########
@@ -259,49 +247,11 @@ private void cleanUsingAcidDir(CompactionInfo ci, String 
location, long minOpenT
      */
 
     // Creating 'reader' list since we are interested in the set of 'obsolete' 
files
-    ValidReaderWriteIdList validWriteIdList = getValidCleanerWriteIdList(ci, 
validTxnList);
-    LOG.debug("Cleaning based on writeIdList: {}", validWriteIdList);
-
-    Path path = new Path(location);
-    FileSystem fs = path.getFileSystem(conf);
-
-    // Collect all the files/dirs
-    Map<Path, AcidUtils.HdfsDirSnapshot> dirSnapshots = 
AcidUtils.getHdfsDirSnapshotsForCleaner(fs, path);
-    AcidDirectory dir = AcidUtils.getAcidState(fs, path, conf, 
validWriteIdList, Ref.from(false), false,
-            dirSnapshots);
+    ValidReaderWriteIdList validWriteIdList = 
getValidCleanerWriteIdListForCompactionCleaner(ci, validTxnList);
     Table table = metadataCache.computeIfAbsent(ci.getFullTableName(), () -> 
resolveTable(ci.dbname, ci.tableName));
-    boolean isDynPartAbort = CompactorUtil.isDynPartAbort(table, ci.partName);
-
-    List<Path> obsoleteDirs = CompactorUtil.getObsoleteDirs(dir, 
isDynPartAbort);
-    if (isDynPartAbort || dir.hasUncompactedAborts()) {
-      ci.setWriteIds(dir.hasUncompactedAborts(), dir.getAbortedWriteIds());
-    }
-
-    List<Path> deleted = fsRemover.clean(new 
CleanupRequestBuilder().setLocation(location)
-            
.setDbName(ci.dbname).setFullPartitionName(ci.getFullPartitionName())
-            .setRunAs(ci.runAs).setObsoleteDirs(obsoleteDirs).setPurge(true)
-            .build());
-
-    if (!deleted.isEmpty()) {
-      AcidMetricService.updateMetricsFromCleaner(ci.dbname, ci.tableName, 
ci.partName, dir.getObsolete(), conf,
-              txnHandler);
-    }
-
-    // Make sure there are no leftovers below the compacted watermark
-    boolean success = false;
-    conf.set(ValidTxnList.VALID_TXNS_KEY, new ValidReadTxnList().toString());
-    dir = AcidUtils.getAcidState(fs, path, conf, new ValidReaderWriteIdList(
-                    ci.getFullTableName(), new long[0], new BitSet(), 
ci.highestWriteId, Long.MAX_VALUE),
-            Ref.from(false), false, dirSnapshots);
+    LOG.debug("Cleaning based on writeIdList: {}", validWriteIdList);
 
-    List<Path> remained = subtract(CompactorUtil.getObsoleteDirs(dir, 
isDynPartAbort), deleted);
-    if (!remained.isEmpty()) {
-      LOG.warn("{} Remained {} obsolete directories from {}. {}",
-              idWatermark(ci), remained.size(), location, 
CompactorUtil.getDebugInfo(remained));
-    } else {
-      LOG.debug("{} All cleared below the watermark: {} from {}", 
idWatermark(ci), ci.highestWriteId, location);
-      success = true;
-    }
+    boolean success = cleanAndVerifyObsoleteDirectories(ci, location, 
validWriteIdList, table);

Review Comment:
   1 line below no need to check for `isDynPartAbort `





Issue Time Tracking
-------------------

    Worklog Id:     (was: 855785)
    Time Spent: 9h 40m  (was: 9.5h)

> Implement a separate handler to handle aborted transaction cleanup
> ------------------------------------------------------------------
>
>                 Key: HIVE-27020
>                 URL: https://issues.apache.org/jira/browse/HIVE-27020
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Sourabh Badhya
>            Assignee: Sourabh Badhya
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> As described in the parent task, once the cleaner is separated into different 
> entities, implement a separate handler which can create requests for aborted 
> transactions cleanup. This would move the aborted transaction cleanup 
> exclusively to the cleaner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to