[
https://issues.apache.org/jira/browse/HIVE-25492?focusedWorklogId=752705&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752705
]
ASF GitHub Bot logged work on HIVE-25492:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 05/Apr/22 08:21
Start Date: 05/Apr/22 08:21
Worklog Time Spent: 10m
Work Description: deniskuzZ commented on code in PR #3157:
URL: https://github.com/apache/hive/pull/3157#discussion_r842506372
##########
ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java:
##########
@@ -1480,6 +1482,57 @@ private static ValidTxnList
getValidTxnList(Configuration conf) {
return validTxnList;
}
+
+ /**
+ * In case of the cleaner, we don't need to go into file level, it is enough
to collect base/delta/deletedelta directories.
+ *
+ * @param fs the filesystem used for the directory lookup
+ * @param path the path of the table or partition needs to be cleaned
+ * @return The listed directory snapshot needs to be checked for cleaning
+ * @throws IOException on filesystem errors
+ */
+ public static Map<Path, HdfsDirSnapshot> getHdfsDirSnapshotsForCleaner(final
FileSystem fs, final Path path)
+ throws IOException {
+ Map<Path, HdfsDirSnapshot> dirToSnapshots = new HashMap<>();
+ // depth first search
+ Deque<RemoteIterator<FileStatus>> stack = new ArrayDeque<>();
+ stack.push(fs.listStatusIterator(path));
+ while (!stack.isEmpty()) {
+ RemoteIterator<FileStatus> itr = stack.pop();
+ while (itr.hasNext()) {
+ FileStatus fStatus = itr.next();
+ Path fPath = fStatus.getPath();
+ if (acidHiddenFileFilter.accept(fPath) &&
acidTempDirFilter.accept(fPath)) {
Review Comment:
we could use hiddenFileFilter as we don't need to include METADATA_FILE &
ACID_FORMAT
Issue Time Tracking
-------------------
Worklog Id: (was: 752705)
Time Spent: 2h (was: 1h 50m)
> Major query-based compaction is skipped if partition is empty
> -------------------------------------------------------------
>
> Key: HIVE-25492
> URL: https://issues.apache.org/jira/browse/HIVE-25492
> Project: Hive
> Issue Type: Bug
> Reporter: Karen Coppage
> Assignee: Antal Sinkovits
> Priority: Major
> Labels: pull-request-available
> Time Spent: 2h
> Remaining Estimate: 0h
>
> Currently if the result of query-based compaction is an empty base, delta, or
> delete delta, the empty directory is deleted.
> This is because of minor compaction – if there are only deltas to compact,
> then no compacted delete delta should be created (only a compacted delta). In
> the same way, if there are only delete deltas to compact, then no compacted
> delta should be created (only a compacted delete delta).
> There is an issue with major compaction. If all the data in the partition has
> been deleted, then we should get an empty base directory after compaction.
> Instead, the empty base directory is deleted because it's empty and
> compaction claims to succeed but we end up with the same deltas/delete deltas
> we started with – basically compaction does not run.
> Where to start? MajorQueryCompactor#commitCompaction
--
This message was sent by Atlassian Jira
(v8.20.1#820001)