milleruntime commented on a change in pull request #2214:
URL: https://github.com/apache/accumulo/pull/2214#discussion_r677432267
##########
File path:
server/gc/src/test/java/org/apache/accumulo/gc/GarbageCollectionTest.java
##########
@@ -51,13 +58,29 @@
TreeMap<String,Status> filesToReplicate = new TreeMap<>();
@Override
- public boolean getCandidates(String continuePoint, List<String> ret) {
- Iterator<String> iter = candidates.tailSet(continuePoint,
false).iterator();
- while (iter.hasNext() && ret.size() < 3) {
- ret.add(iter.next());
+ public void processCandidates() throws TableNotFoundException, IOException
{
+
+ Iterator<String> candidatesIter = candidates.iterator();
+
+ while (candidatesIter.hasNext()) {
+ List<String> candidatesBatch =
readCandidatesThatFitInMemory(candidatesIter);
+ new GarbageCollectionAlgorithm().collectBatch(this, candidatesBatch);
}
+ // Remove all candidates that were tagged for deletion now that the
processing of
+ // candidates is complete for this round. This was removed from the
'delete' method
+ // due to that causing a ConcurrentModificationException. T
Review comment:
Extra character in comment.
```suggestion
// due to that causing a ConcurrentModificationException.
```
##########
File path:
server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectionEnvironment.java
##########
@@ -40,19 +39,10 @@
public interface GarbageCollectionEnvironment {
/**
- * Return a list of paths to files and dirs which are candidates for
deletion from a given table,
- * {@link RootTable#NAME} or {@link MetadataTable#NAME}
- *
- * @param continuePoint
- * A row to resume from if a previous invocation was stopped due to
finding an extremely
- * large number of candidates to remove which would have exceeded
memory limitations
- * @param candidates
- * A collection of candidates files for deletion, may not be the
complete collection of
- * files for deletion at this point in time
- * @return true if the results are short due to insufficient memory,
otherwise false
+ * Process all possible deletion candidates for a given table, deleting
candidates that meet all
+ * necessary conditions.
*/
- boolean getCandidates(String continuePoint, List<String> candidates)
- throws TableNotFoundException;
+ void processCandidates() throws TableNotFoundException, IOException;
Review comment:
I wonder if this method should be split up. Previously we had one method
to gather the candidates and then another that processed them.
##########
File path:
server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectionAlgorithm.java
##########
@@ -290,30 +290,26 @@ private void deleteConfirmed(GarbageCollectionEnvironment
gce,
cleanUpDeletedTableDirs(gce, candidateMap);
}
- public void collect(GarbageCollectionEnvironment gce) throws
TableNotFoundException, IOException {
-
- String lastCandidate = "";
-
- boolean outOfMemory = true;
- while (outOfMemory) {
- List<String> candidates = new ArrayList<>();
+ /**
+ * Given a sub-list of possible delection candidates, process and remove
valid deletion
+ * candidates.
+ */
+ public void collectBatch(GarbageCollectionEnvironment gce, List<String>
currentBatch)
+ throws TableNotFoundException, IOException {
- outOfMemory = getCandidates(gce, lastCandidate, candidates);
-
- if (candidates.isEmpty())
- break;
- else
- lastCandidate = candidates.get(candidates.size() - 1);
+ long origSize = currentBatch.size();
+ gce.incrementCandidatesStat(origSize);
- long origSize = candidates.size();
- gce.incrementCandidatesStat(origSize);
+ SortedMap<String,String> candidateMap = makeRelative(currentBatch);
- SortedMap<String,String> candidateMap = makeRelative(candidates);
+ confirmDeletesTrace(gce, candidateMap);
+ gce.incrementInUseStat(origSize - candidateMap.size());
- confirmDeletesTrace(gce, candidateMap);
- gce.incrementInUseStat(origSize - candidateMap.size());
+ deleteConfirmed(gce, candidateMap);
+ }
Review comment:
The previous `collect()` method was confusing but it was the only public
method of the `GarbageCollectionAlgorithm` so it was at least easier to follow
from the one entry point. Since you are refactoring the methods, it might be a
good opportunity to clean it up and make it easier to follow. The GC was
already confusing but the way you split up the methods, is more confusing to
me. It looks like you have the correct business logic between the methods but I
think it could be organized better.
It seems the simplest solution would be to keep the same methods but just
drop the continue point. Was there a reason you had to create new methods? I
saw your comment in the tests about ConcurrentModificationException but I am
not seeing where that could happen.
##########
File path:
server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectionAlgorithm.java
##########
@@ -290,30 +290,26 @@ private void deleteConfirmed(GarbageCollectionEnvironment
gce,
cleanUpDeletedTableDirs(gce, candidateMap);
}
- public void collect(GarbageCollectionEnvironment gce) throws
TableNotFoundException, IOException {
-
- String lastCandidate = "";
-
- boolean outOfMemory = true;
- while (outOfMemory) {
- List<String> candidates = new ArrayList<>();
+ /**
+ * Given a sub-list of possible delection candidates, process and remove
valid deletion
Review comment:
```suggestion
* Given a sub-list of possible deletion candidates, process and remove
valid deletion
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]