keith-turner commented on code in PR #2792:
URL: https://github.com/apache/accumulo/pull/2792#discussion_r972356614


##########
server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectionAlgorithm.java:
##########
@@ -216,6 +225,46 @@ private long 
removeBlipCandidates(GarbageCollectionEnvironment gce,
     return blipCount;
   }
 
+  @VisibleForTesting
+  /**
+   * Double check no tables were missed during GC
+   */
+  protected void ensureAllTablesChecked(Set<TableId> tableIdsBefore, 
Set<TableId> tableIdsSeen,
+      Set<TableId> tableIdsAfter) {
+
+    // if a table was added or deleted during this run, it is acceptable to not
+    // have seen those tables ids when scanning the metadata table. So get the 
intersection
+    final Set<TableId> tableIdsMustHaveSeen = new HashSet<>(tableIdsBefore);
+    tableIdsMustHaveSeen.retainAll(tableIdsAfter);
+
+    if (tableIdsMustHaveSeen.isEmpty() && !tableIdsSeen.isEmpty()) {
+      throw new RuntimeException("Garbage collection will not proceed because "
+          + "table ids were seen in the metadata table and none were seen 
Zookeeper. "
+          + "This can have two causes. First, total number of tables going 
to/from "
+          + "zero during a GC cycle will cause this. Second, it could be 
caused by "
+          + "corruption of the metadata table and/or Zookeeper. Only the 
second cause "
+          + "is problematic, but there is no way to distinguish between the 
two causes "
+          + "so this GC cycle will not proceed. The first cause should be 
transient "
+          + "and one would not expect to see this message repeated in 
subsequent GC cycles.");
+    }
+
+    // From that intersection, remove all the table ids that were seen.
+    tableIdsMustHaveSeen.removeAll(tableIdsSeen);
+
+    // If anything is left then we missed a table and may not have removed 
rfiles references
+    // from the candidates list that are acutally still in use, which would
+    // result in the rfiles being deleted in the next step of the GC process
+    if (!tableIdsMustHaveSeen.isEmpty()) {
+      log.error("TableIDs before: " + tableIdsBefore);
+      log.error("TableIDs after : " + tableIdsAfter);
+      log.error("TableIDs seen  : " + tableIdsSeen);
+      log.error("TableIDs that should have been seen but were not: " + 
tableIdsMustHaveSeen);
+      // maybe a scan failed?
+      throw new RuntimeException(
+          "Saw table IDs in ZK that were not in metadata table:  " + 
tableIdsMustHaveSeen);
+    }

Review Comment:
   > The canonical determination of whether a table exists or not is that it 
has an entry in ZK... this is created before metadata entries, and is the last 
thing removed when a table is deleted.
   
   Good catch, we need to consider table states to avoid this race condition.  
I mentioned table states in #1377, but its been so long I had completely 
forgotten about that edge case and I did not reread the issue until now.
   
   When a table is created the following is done.
   
    1. table is put in ZK w/ TableState.NEW
    2. metadata table is populated
    3. tables state is set to TableState.ONLINE or TableState.OFFLINE
   
   When a table is deleted the following is done.
   
    1. Table state is set to TableState.DELETING
    2. entries are removed from metadata table
    3. entries are removed from ZK
   
   So from the perspective of GC, if we see a table with a state of 
TableState.ONLINE or TableState.OFFLINE before and after scanning the metadata 
table, then it must be seen in the metadata table unless there is a problem.
   
   We need to get a `Map<TableId,TableState>` to properly do this check.
   
   
   
     



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to