keith-turner commented on a change in pull request #2293:
URL: https://github.com/apache/accumulo/pull/2293#discussion_r733847427



##########
File path: 
server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectionAlgorithm.java
##########
@@ -217,11 +221,43 @@ private void confirmDeletes(GarbageCollectionEnvironment 
gce,
         throw new RuntimeException(
             "Scanner over metadata table returned unexpected column : " + 
entry.getKey());
     }
+    Set<String> tableIdsAfter = gce.getTableIDs();
+    ensureAllTablesChecked(Collections.unmodifiableSet(tableIdsBefore),
+        Collections.unmodifiableSet(tableIdsSeen), 
Collections.unmodifiableSet(tableIdsAfter));
 
     confirmDeletesFromReplication(gce.getReplicationNeededIterator(),
         candidateMap.entrySet().iterator());
   }
 
+  @VisibleForTesting
+  /**
+   *
+   */
+  protected void ensureAllTablesChecked(Set<String> tableIdsBefore, 
Set<String> tableIdsSeen,
+      Set<String> tableIdsAfter) {
+
+    // if a table was added or deleted during this run, it is acceptable to not
+    // have seen those tables ids when scanning the metadata table. So get the 
intersection
+    Set<String> tableIdsMustHaveSeen = new HashSet<>(tableIdsBefore);
+    tableIdsMustHaveSeen.retainAll(tableIdsAfter);
+
+    if (tableIdsMustHaveSeen.isEmpty() && !tableIdsSeen.isEmpty()) {
+      // we saw no table ids in ZK but did in the metadata table. This is 
unexpected.
+      throw new RuntimeException(
+          "Saw no table ids in ZK but did see table ids in metadata table: " + 
tableIdsSeen);
+    }
+
+    // From that intersection, remove all the table ids that were seen.
+    tableIdsMustHaveSeen.removeAll(tableIdsSeen);

Review comment:
       > Maybe in 2.x we need a configurable mode for the GC to handle both use 
cases.
   
   We could also explore reducing/removing the ambiguity when trying to discern 
between tables being added/removed and some kind of silent error when reading 
from ZK.  For example maybe for deleting tables we could make the GC remove 
table ids (with a certain table state like DELETED) from ZK instead of the 
manager.  This would allow the GC to positively identify a table that was 
deleted while it was scanning the metadata table and know that its ok to see or 
not see that table in the metadata table.   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to