[GitHub] [accumulo] milleruntime commented on a diff in pull request #2792: closes #1377 - ensure all tables are checked ...

GitBox Wed, 29 Jun 2022 07:38:15 -0700


milleruntime commented on code in PR #2792:
URL: https://github.com/apache/accumulo/pull/2792#discussion_r909832662



##########
server/gc/src/main/java/org/apache/accumulo/gc/GCRun.java:
##########
@@ -457,4 +459,28 @@ public long getErrorsStat() {
   public long getCandidatesStat() {
     return candidates;
   }
+
+  @Override
+  public boolean isRootTable() {
+    return level == DataLevel.ROOT;
+  }
+
+  @Override
+  public boolean isMetadataTable() {
+    return level == DataLevel.METADATA;
+  }
+
+  @Override
+  public Set<TableId> getCandidateTableIDs() {
+    if (isRootTable()) {
+      return Collections.singleton(MetadataTable.ID);
+    } else if (isMetadataTable()) {
+      Set<TableId> tableIds = new HashSet<>(getTableIDs());

Review Comment:
   This is OK but just calling `getTableIDs()` is probably not enough on a 
highly active cluster. This just calls the API and gets whatever is cached at 
the time. 
   
   @ctubbsii comment from the original PR: "It can't detect tables that existed 
at the start but skipped, if the ZooCache wasn't up-to-date, and it can't 
detect skipping over any tables that were created or deleted during the scan. 
This is especially problematic if a new table was created as the result of a 
clone operation, which will duplicate all the file references for the new 
table."



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [accumulo] milleruntime commented on a diff in pull request #2792: closes #1377 - ensure all tables are checked ...

Reply via email to