klcopp commented on a change in pull request #1716:
URL: https://github.com/apache/hive/pull/1716#discussion_r533584095
##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##########
@@ -316,6 +314,30 @@ private boolean removeFiles(String location,
ValidWriteIdList writeIdList, Compa
}
fs.delete(dead, true);
}
- return true;
+ // Check if there will be more obsolete directories to clean when
possible. We will only mark cleaned when this
+ // number reaches 0.
+ return getNumEventuallyObsoleteDirs(location, dirSnapshots) == 0;
+ }
+
+ /**
+ * Get the number of base/delta directories the Cleaner should remove
eventually. If we check this after cleaning
+ * we can see if the Cleaner has further work to do in this table/partition
directory that it hasn't been able to
+ * finish, e.g. because of an open transaction at the time of compaction.
+ * We do this by assuming that there are no open transactions anywhere and
then calling getAcidState. If there are
+ * obsolete directories, then the Cleaner has more work to do.
+ * @param location location of table
+ * @return number of dirs left for the cleaner to clean – eventually
+ * @throws IOException
+ */
+ private int getNumEventuallyObsoleteDirs(String location, Map<Path,
AcidUtils.HdfsDirSnapshot> dirSnapshots)
+ throws IOException {
+ ValidTxnList validTxnList = new ValidReadTxnList();
Review comment:
> But if HIVE-24291 is present it shouldn't hurt.
This isn't necessarily true, since as @pvargacl noted there could be aborts
since the compaction happened.
The question here is whether this change or HIVE-24314 is worse for users if
HIVE-23107 (remove MIN_HISTORY_LEVEL) etc. are not present on their version.
##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##########
@@ -316,6 +314,30 @@ private boolean removeFiles(String location,
ValidWriteIdList writeIdList, Compa
}
fs.delete(dead, true);
}
- return true;
+ // Check if there will be more obsolete directories to clean when
possible. We will only mark cleaned when this
+ // number reaches 0.
+ return getNumEventuallyObsoleteDirs(location, dirSnapshots) == 0;
+ }
+
+ /**
+ * Get the number of base/delta directories the Cleaner should remove
eventually. If we check this after cleaning
+ * we can see if the Cleaner has further work to do in this table/partition
directory that it hasn't been able to
+ * finish, e.g. because of an open transaction at the time of compaction.
+ * We do this by assuming that there are no open transactions anywhere and
then calling getAcidState. If there are
+ * obsolete directories, then the Cleaner has more work to do.
+ * @param location location of table
+ * @return number of dirs left for the cleaner to clean – eventually
+ * @throws IOException
+ */
+ private int getNumEventuallyObsoleteDirs(String location, Map<Path,
AcidUtils.HdfsDirSnapshot> dirSnapshots)
+ throws IOException {
+ ValidTxnList validTxnList = new ValidReadTxnList();
Review comment:
> But if HIVE-24291 is present it shouldn't hurt.
This isn't necessarily true, since as @pvargacl noted there could be aborts
since the compaction happened.
The question here is whether this change or HIVE-24314 is worse for users if
HIVE-23107 (remove MIN_HISTORY_LEVEL) etc. are not present on their version.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]