[GitHub] [accumulo] keith-turner commented on a change in pull request #2293: closes #1377 - first stab at checking we hit all the tables

GitBox Mon, 18 Oct 2021 13:28:20 -0700


keith-turner commented on a change in pull request #2293:
URL: https://github.com/apache/accumulo/pull/2293#discussion_r731273093




##########
File path: 
server/gc/src/test/java/org/apache/accumulo/gc/GarbageCollectionTest.java
##########
@@ -130,7 +136,31 @@ public Value addDirReference(String tableId, String 
endRow, String dir) {
     }
 
     public Value removeDirReference(String tableId, String endRow) {
-      return references.remove(newDirReferenceKey(tableId, endRow));
+      Value retVal = references.remove(newDirReferenceKey(tableId, endRow));
+      removeDanglingTableIds(tableId);
+      return retVal;
+    }
+
+    /*
+     * this is to be called from removeDirReference or removeFileReference.
+     *
+     * If you just removed the last reference to a table, we need to remove it 
from
+     * the tableIds in zookeeper
+     */
+    private void removeDanglingTableIds(String tableId) {
+      for(Entry<Key, Value> entry : references.entrySet()) {
+        // check table id from row
+        // TODO: check table ids referenced in dir in || clause
+        if (entry.getKey().getRow().toString().equals(tableId)) {

Review comment:
       The row contains more than just the tableID,  Seems like this should do 
something like `String tableID = new 
String(KeyExtent.tableOfMetadataRow(key.getRow()));` to get the id. Copied that 
from elsewhere in this PR.
   
   Not completely sure I understand the intent of this method.  I think its 
meant to remove a tableId if there are no refs to it, I am not sure it does 
that though.  Could possibly do the the following if that is the intent.
   
   ```java
     bool inUse = references.keySet().stream().map(k -> new 
String(KeyExtent.tableOfMetadataRow(key.getRow()))).anyMatch(tid -> 
tableId.equals(tid));
     if(!inUse) {
        assertTrue(tableIds.remove(tableId));
     }
   ```
   

##########
File path: 
server/gc/src/test/java/org/apache/accumulo/gc/GarbageCollectionTest.java
##########
@@ -644,4 +692,132 @@ public void bulkImportReplicationRecordsPreventDeletion() 
throws Exception {
     assertEquals(1, gce.deletes.size());
     assertEquals("hdfs://foo.com:6000/accumulo/tables/2/t-00002/A000002.rf", 
gce.deletes.get(0));
   }
+
+  @Test
+  public void testMissingTableIds() throws Exception {
+    GarbageCollectionAlgorithm gca = new GarbageCollectionAlgorithm();
+
+    TestGCE gce = new TestGCE();
+
+    gce.candidates.add("hdfs://foo.com:6000/user/foo/tables/a/t-0/F00.rf");
+
+    gce.addFileReference("a", null, 
"hdfs://foo.com:6000/user/foo/tables/a/t-0/F00.rf");
+    gce.addFileReference("c", null, 
"hdfs://foo.com:6000/user/foo/tables/c/t-0/F00.rf");
+
+    // the following table ids must be seen in the references
+    gce.tableIds.add("a");
+    gce.tableIds.add("b");
+    gce.tableIds.add("c");
+    gce.tableIds.add("d");
+
+    String msg = assertThrows(RuntimeException.class, () -> 
gca.collect(gce)).getMessage();
+    assertTrue(msg, (msg.contains("[b, d]") || msg.contains("[d, b]"))
+        && msg.contains("Saw table IDs in ZK that were not in metadata 
table:"));
+  }
+
+  // below are tests for potential failure conditions of the GC process. Some 
of these cases were
+  // observed on clusters. Some were hypothesis based on observations. The 
result was that
+  // candidate entries were not removed when they should have been and 
therefore files were
+  // removed from HDFS that were actually still in use
+
+  private Set<String> makeUnmodifiableSet(String... args) {
+    return Collections.unmodifiableSet(new HashSet<>(Arrays.asList(args)));
+  }
+
+  @Test
+  public void testNormalGCRun() {
+    // happy path, no tables added or removed during this portion and all the 
tables checked
+    Set<String> tablesBefore = makeUnmodifiableSet("1", "2", "3");
+    Set<String> tablesSeen = makeUnmodifiableSet("2", "1", "3");
+    Set<String> tablesAfter = makeUnmodifiableSet("1", "3", "2");
+
+    new GarbageCollectionAlgorithm().ensureAllTablesChecked(tablesBefore, 
tablesSeen, tablesAfter);
+  }
+
+  @Test
+  public void testTableAddedInMiddle() {
+    // table was added during this portion and we don't see it, should be fine
+    Set<String> tablesBefore = makeUnmodifiableSet("1", "2", "3");
+    Set<String> tablesSeen = makeUnmodifiableSet("2", "1", "3");
+    Set<String> tablesAfter = makeUnmodifiableSet("1", "3", "2", "4");
+
+    new GarbageCollectionAlgorithm().ensureAllTablesChecked(tablesBefore, 
tablesSeen, tablesAfter);
+  }
+
+  @Test
+  public void testTableAddedInMiddleTwo() {
+    // table was added during this portion and we DO see it
+    // Means table was added after candidates were grabbed, so there should be 
nothing to remove
+    Set<String> tablesBefore = makeUnmodifiableSet("1", "2", "3");
+    Set<String> tablesSeen = makeUnmodifiableSet("2", "1", "3", "4");
+    Set<String> tablesAfter = makeUnmodifiableSet("1", "3", "2", "4");
+
+    new GarbageCollectionAlgorithm().ensureAllTablesChecked(tablesBefore, 
tablesSeen, tablesAfter);
+  }
+
+  @Test
+  public void testTableDeletedInMiddle() {
+    // table was deleted during this portion and we don't see it
+    // this mean any candidates from the deleted table wil stay on the 
candidate list
+    // and during the delete step they will try to removed
+    Set<String> tablesBefore = makeUnmodifiableSet("1", "2", "3", "4");
+    Set<String> tablesSeen = makeUnmodifiableSet("2", "1", "4");
+    Set<String> tablesAfter = makeUnmodifiableSet("1", "2", "4");
+
+    new GarbageCollectionAlgorithm().ensureAllTablesChecked(tablesBefore, 
tablesSeen, tablesAfter);
+  }
+
+  @Test
+  public void testTableDeletedInMiddleTwo() {
+    // table was deleted during this portion and we DO see it
+    // this mean candidates from the deleted table may get removed from the 
candidate list
+    // which should be ok, as the delete table function should be responsible 
for removing those
+    Set<String> tablesBefore = makeUnmodifiableSet("1", "2", "3", "4");
+    Set<String> tablesSeen = makeUnmodifiableSet("2", "1", "4", "3");
+    Set<String> tablesAfter = makeUnmodifiableSet("1", "2", "4");
+
+    new GarbageCollectionAlgorithm().ensureAllTablesChecked(tablesBefore, 
tablesSeen, tablesAfter);
+  }
+
+  @Test
+  public void testMissEntireTable() {
+    // this test simulates missing an entire table when looking for what files 
are in use
+    // if you add custom splits to the metadata at able boundaries, this can 
happen with a failed
+    // scan
+    // recall the ~tab:~pr for this first entry of a new table is empty, so 
there is now way to
+    // check the prior row. If you split a couple of tables in the metadata 
the table boundary
+    // , say table ids 2,3,4, and then miss scanning table 3 but get 4, it is 
possible other
+    // consistency checks will miss this
+    Set<String> tablesBefore = makeUnmodifiableSet("1", "2", "3", "4");
+    Set<String> tablesSeen = makeUnmodifiableSet("1", "2", "4");
+    Set<String> tablesAfter = makeUnmodifiableSet("1", "2", "3", "4");
+

Review comment:
       In addition to this test method it would be nice to have a few more test 
methods w/ variations for this scenario.  
   
   Like the following is a table that is missing and a table being added.
   
   ```
       Set<String> tablesBefore = makeUnmodifiableSet("1", "2", "3");
       Set<String> tablesSeen = makeUnmodifiableSet("1", "2", "4");
       Set<String> tablesAfter = makeUnmodifiableSet("1", "2", "3", "4");
   ```
   
   Could also have a table missing and a table being removed
   
   ```
       Set<String> tablesBefore = makeUnmodifiableSet("1", "2", "3","4");
       Set<String> tablesSeen = makeUnmodifiableSet("1", "2", "4");
       Set<String> tablesAfter = makeUnmodifiableSet("1", "2", "3");
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [accumulo] keith-turner commented on a change in pull request #2293: closes #1377 - first stab at checking we hit all the tables

Reply via email to