[
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855785&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855785
]
ASF GitHub Bot logged work on HIVE-27020:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 10/Apr/23 11:58
Start Date: 10/Apr/23 11:58
Worklog Time Spent: 10m
Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161664755
##########
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/CompactionCleaner.java:
##########
@@ -259,49 +247,11 @@ private void cleanUsingAcidDir(CompactionInfo ci, String
location, long minOpenT
*/
// Creating 'reader' list since we are interested in the set of 'obsolete'
files
- ValidReaderWriteIdList validWriteIdList = getValidCleanerWriteIdList(ci,
validTxnList);
- LOG.debug("Cleaning based on writeIdList: {}", validWriteIdList);
-
- Path path = new Path(location);
- FileSystem fs = path.getFileSystem(conf);
-
- // Collect all the files/dirs
- Map<Path, AcidUtils.HdfsDirSnapshot> dirSnapshots =
AcidUtils.getHdfsDirSnapshotsForCleaner(fs, path);
- AcidDirectory dir = AcidUtils.getAcidState(fs, path, conf,
validWriteIdList, Ref.from(false), false,
- dirSnapshots);
+ ValidReaderWriteIdList validWriteIdList =
getValidCleanerWriteIdListForCompactionCleaner(ci, validTxnList);
Table table = metadataCache.computeIfAbsent(ci.getFullTableName(), () ->
resolveTable(ci.dbname, ci.tableName));
- boolean isDynPartAbort = CompactorUtil.isDynPartAbort(table, ci.partName);
-
- List<Path> obsoleteDirs = CompactorUtil.getObsoleteDirs(dir,
isDynPartAbort);
- if (isDynPartAbort || dir.hasUncompactedAborts()) {
- ci.setWriteIds(dir.hasUncompactedAborts(), dir.getAbortedWriteIds());
- }
-
- List<Path> deleted = fsRemover.clean(new
CleanupRequestBuilder().setLocation(location)
-
.setDbName(ci.dbname).setFullPartitionName(ci.getFullPartitionName())
- .setRunAs(ci.runAs).setObsoleteDirs(obsoleteDirs).setPurge(true)
- .build());
-
- if (!deleted.isEmpty()) {
- AcidMetricService.updateMetricsFromCleaner(ci.dbname, ci.tableName,
ci.partName, dir.getObsolete(), conf,
- txnHandler);
- }
-
- // Make sure there are no leftovers below the compacted watermark
- boolean success = false;
- conf.set(ValidTxnList.VALID_TXNS_KEY, new ValidReadTxnList().toString());
- dir = AcidUtils.getAcidState(fs, path, conf, new ValidReaderWriteIdList(
- ci.getFullTableName(), new long[0], new BitSet(),
ci.highestWriteId, Long.MAX_VALUE),
- Ref.from(false), false, dirSnapshots);
+ LOG.debug("Cleaning based on writeIdList: {}", validWriteIdList);
- List<Path> remained = subtract(CompactorUtil.getObsoleteDirs(dir,
isDynPartAbort), deleted);
- if (!remained.isEmpty()) {
- LOG.warn("{} Remained {} obsolete directories from {}. {}",
- idWatermark(ci), remained.size(), location,
CompactorUtil.getDebugInfo(remained));
- } else {
- LOG.debug("{} All cleared below the watermark: {} from {}",
idWatermark(ci), ci.highestWriteId, location);
- success = true;
- }
+ boolean success = cleanAndVerifyObsoleteDirectories(ci, location,
validWriteIdList, table);
Review Comment:
1 line below no need to check for `isDynPartAbort `
Issue Time Tracking
-------------------
Worklog Id: (was: 855785)
Time Spent: 9h 40m (was: 9.5h)
> Implement a separate handler to handle aborted transaction cleanup
> ------------------------------------------------------------------
>
> Key: HIVE-27020
> URL: https://issues.apache.org/jira/browse/HIVE-27020
> Project: Hive
> Issue Type: Sub-task
> Reporter: Sourabh Badhya
> Assignee: Sourabh Badhya
> Priority: Major
> Labels: pull-request-available
> Time Spent: 9h 40m
> Remaining Estimate: 0h
>
> As described in the parent task, once the cleaner is separated into different
> entities, implement a separate handler which can create requests for aborted
> transactions cleanup. This would move the aborted transaction cleanup
> exclusively to the cleaner.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)