milleruntime commented on a change in pull request #2214:
URL: https://github.com/apache/accumulo/pull/2214#discussion_r677432267



##########
File path: 
server/gc/src/test/java/org/apache/accumulo/gc/GarbageCollectionTest.java
##########
@@ -51,13 +58,29 @@
     TreeMap<String,Status> filesToReplicate = new TreeMap<>();
 
     @Override
-    public boolean getCandidates(String continuePoint, List<String> ret) {
-      Iterator<String> iter = candidates.tailSet(continuePoint, 
false).iterator();
-      while (iter.hasNext() && ret.size() < 3) {
-        ret.add(iter.next());
+    public void processCandidates() throws TableNotFoundException, IOException 
{
+
+      Iterator<String> candidatesIter = candidates.iterator();
+
+      while (candidatesIter.hasNext()) {
+        List<String> candidatesBatch = 
readCandidatesThatFitInMemory(candidatesIter);
+        new GarbageCollectionAlgorithm().collectBatch(this, candidatesBatch);
       }
+      // Remove all candidates that were tagged for deletion now that the 
processing of
+      // candidates is complete for this round. This was removed from the 
'delete' method
+      // due to that causing a ConcurrentModificationException. T

Review comment:
       Extra character in comment.
   ```suggestion
         // due to that causing a ConcurrentModificationException.
   ```

##########
File path: 
server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectionEnvironment.java
##########
@@ -40,19 +39,10 @@
 public interface GarbageCollectionEnvironment {
 
   /**
-   * Return a list of paths to files and dirs which are candidates for 
deletion from a given table,
-   * {@link RootTable#NAME} or {@link MetadataTable#NAME}
-   *
-   * @param continuePoint
-   *          A row to resume from if a previous invocation was stopped due to 
finding an extremely
-   *          large number of candidates to remove which would have exceeded 
memory limitations
-   * @param candidates
-   *          A collection of candidates files for deletion, may not be the 
complete collection of
-   *          files for deletion at this point in time
-   * @return true if the results are short due to insufficient memory, 
otherwise false
+   * Process all possible deletion candidates for a given table, deleting 
candidates that meet all
+   * necessary conditions.
    */
-  boolean getCandidates(String continuePoint, List<String> candidates)
-      throws TableNotFoundException;
+  void processCandidates() throws TableNotFoundException, IOException;

Review comment:
       I wonder if this method should be split up. Previously we had one method 
to gather the candidates and then another that processed them.

##########
File path: 
server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectionAlgorithm.java
##########
@@ -290,30 +290,26 @@ private void deleteConfirmed(GarbageCollectionEnvironment 
gce,
     cleanUpDeletedTableDirs(gce, candidateMap);
   }
 
-  public void collect(GarbageCollectionEnvironment gce) throws 
TableNotFoundException, IOException {
-
-    String lastCandidate = "";
-
-    boolean outOfMemory = true;
-    while (outOfMemory) {
-      List<String> candidates = new ArrayList<>();
+  /**
+   * Given a sub-list of possible delection candidates, process and remove 
valid deletion
+   * candidates.
+   */
+  public void collectBatch(GarbageCollectionEnvironment gce, List<String> 
currentBatch)
+      throws TableNotFoundException, IOException {
 
-      outOfMemory = getCandidates(gce, lastCandidate, candidates);
-
-      if (candidates.isEmpty())
-        break;
-      else
-        lastCandidate = candidates.get(candidates.size() - 1);
+    long origSize = currentBatch.size();
+    gce.incrementCandidatesStat(origSize);
 
-      long origSize = candidates.size();
-      gce.incrementCandidatesStat(origSize);
+    SortedMap<String,String> candidateMap = makeRelative(currentBatch);
 
-      SortedMap<String,String> candidateMap = makeRelative(candidates);
+    confirmDeletesTrace(gce, candidateMap);
+    gce.incrementInUseStat(origSize - candidateMap.size());
 
-      confirmDeletesTrace(gce, candidateMap);
-      gce.incrementInUseStat(origSize - candidateMap.size());
+    deleteConfirmed(gce, candidateMap);
+  }

Review comment:
       The previous `collect()` method was confusing but it was the only public 
method of the `GarbageCollectionAlgorithm` so it was at least easier to follow 
from the one entry point. Since you are refactoring the methods, it might be a 
good opportunity to clean it up and make it easier to follow. The GC was 
already confusing but the way you split up the methods, is more confusing to 
me. It looks like you have the correct business logic between the methods but I 
think it could be organized better.
   
   It seems the simplest solution would be to keep the same methods but just 
drop the continue point. Was there a reason you had to create new methods? I 
saw your comment in the tests about ConcurrentModificationException but I am 
not seeing where that could happen.

##########
File path: 
server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectionAlgorithm.java
##########
@@ -290,30 +290,26 @@ private void deleteConfirmed(GarbageCollectionEnvironment 
gce,
     cleanUpDeletedTableDirs(gce, candidateMap);
   }
 
-  public void collect(GarbageCollectionEnvironment gce) throws 
TableNotFoundException, IOException {
-
-    String lastCandidate = "";
-
-    boolean outOfMemory = true;
-    while (outOfMemory) {
-      List<String> candidates = new ArrayList<>();
+  /**
+   * Given a sub-list of possible delection candidates, process and remove 
valid deletion

Review comment:
       ```suggestion
      * Given a sub-list of possible deletion candidates, process and remove 
valid deletion
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to