stefan-egli commented on code in PR #1328:
URL: https://github.com/apache/jackrabbit-oak/pull/1328#discussion_r1512625559


##########
oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/VersionGarbageCollector.java:
##########
@@ -1238,12 +1240,11 @@ private void removeUnmergedBCRevision(final Revision 
unmergedBCRevision, final N
         }
 
         private void collectOldRevisions(final NodeDocument doc, final 
GCPhases phases, final UpdateOp updateOp) {

Review Comment:
   Would suggest to either rename to clarify that this is about old and 
between-checkpoint-but-not-younger-than-maxRevisionAgeMillis revisions - or to 
clarify that in a javadoc.
   
   Now, given the old-revision-gc part competes with OAK-10688 we should 
execute only one of the two, not both. One idea could be to add an static 
boolean (not a system property, just a plain boolean) that lets you run 
DetailedGC in either between-checkpoint mode (OAK-10535) or keep-1-revision 
mode (OAK-10688). Then tests can control that flag at their discretion, we can 
compare the two and find bugs in the two, but ultimately decide on which one we 
want.



##########
oak-run/src/main/java/org/apache/jackrabbit/oak/run/RevisionsCommand.java:
##########
@@ -74,15 +80,18 @@ public class RevisionsCommand implements Command {
 
     private static final Logger LOG = 
LoggerFactory.getLogger(RevisionsCommand.class);
 
+    private static final int REVISION_CAP = getInteger("oak.revision.cap", 
250);

Review Comment:
   (where) Is this still used?



##########
oak-run/src/main/java/org/apache/jackrabbit/oak/run/RevisionsCommand.java:
##########
@@ -460,6 +479,55 @@ private void sweep(RevisionsOptions options, Closer closer)
         SweepHelper.sweep(store, new RevisionContextWrapper(ns, clusterId), 
seeker);
     }
 
+    private void pathCleanup(RevisionsOptions options, Closer closer) throws 
IOException {
+        String path = options.getPath();
+        if (path == null || path.isEmpty()) {
+            System.err.println("path option is required for " + 
RevisionsOptions.CMD_CLEANUP + " command");
+            return;
+        }
+
+        DocumentNodeStoreBuilder<?> builder = createDocumentMKBuilder(options, 
closer);
+        if (builder == null) {
+            System.err.println("revisions mode only available for 
DocumentNodeStore");
+            return;
+        }
+
+        DocumentStore documentStore = builder.getDocumentStore();
+        builder.setReadOnlyMode();
+        useMemoryBlobStore(builder);
+        DocumentNodeStore documentNodeStore = builder.build();
+        String id = 
org.apache.jackrabbit.oak.plugins.document.util.Utils.getIdFromPath(path);
+        NodeDocument workingDocument = documentStore.find(NODES, id);
+        //NodeDocumentRevisionCleaner revisionCleaner = new 
NodeDocumentRevisionCleaner(documentNodeStore, workingDocument);

Review Comment:
   Is the idea to keep this as a reminder for a future change (or could it be 
removed now)?



##########
oak-run/src/main/java/org/apache/jackrabbit/oak/run/RevisionsCommand.java:
##########
@@ -74,15 +89,18 @@ public class RevisionsCommand implements Command {
 
     private static final Logger LOG = 
LoggerFactory.getLogger(RevisionsCommand.class);
 
+    private static final int REVISION_CAP = getInteger("oak.revision.cap", 
250);
+
     private static final String USAGE = Joiner.on(System.lineSeparator()).join(
             "revisions {<jdbc-uri> | <mongodb-uri>} <sub-command> [options]",
             "where sub-command is one of",
             "  info     give information about the revisions state without 
performing",
             "           any modifications",
-            "  collect      perform garbage collection",
-            "  reset        clear all persisted metadata",
-            "  sweep        clean up uncommitted changes",
-            "  detailedGC   perform detailed garbage collection i.e. remove 
unmerged branch commits, old revisions, deleted properties etc"
+            "  collect        perform garbage collection",
+            "  reset          clear all persisted metadata",
+            "  sweep          clean up uncommitted changes",
+            "  detailedGC     perform detailed garbage collection i.e. remove 
unmerged branch commits, old revisions, deleted properties etc",
+            "  pathCleanup    clean up old/unused revisions and unmerged 
branch commits on a specific path"

Review Comment:
   alternatively I was wondering if the restriction to a single path could not 
also be a further command line parameter of `detailedGC` mode? (in which case 
we wouldn't need a new mode just make `detailedGC` more fancy)



##########
oak-run/src/main/java/org/apache/jackrabbit/oak/run/RevisionsCommand.java:
##########
@@ -74,15 +80,18 @@ public class RevisionsCommand implements Command {
 
     private static final Logger LOG = 
LoggerFactory.getLogger(RevisionsCommand.class);
 
+    private static final int REVISION_CAP = getInteger("oak.revision.cap", 
250);
+
     private static final String USAGE = Joiner.on(System.lineSeparator()).join(
             "revisions {<jdbc-uri> | <mongodb-uri>} <sub-command> [options]",
             "where sub-command is one of",
             "  info     give information about the revisions state without 
performing",

Review Comment:
   ```suggestion
               "  info        give information about the revisions state 
without performing",
   ```
   (minor: indentation)



##########
oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/VersionGarbageCollector.java:
##########
@@ -1398,6 +1399,24 @@ private void delayOnModifications(final long durationMs, 
final AtomicBoolean can
         }
     }
 
+    public UpdateOp collectGarbageOnDocument(DocumentNodeStore store, 
NodeDocument doc) {

Review Comment:
   As an alternative idea to the 2 steps suggested (`collectGarbageOnDocument` 
then removing the garbage in `RevisionsCommand`) which @rishabhdaim also 
commented on : what about focusing on executing the [core 
part](https://github.com/apache/jackrabbit-oak/blob/cf870b532b107b0c243e1d0180f1213d3f6e9197/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/VersionGarbageCollector.java#L736-L782)
 of `collectDetailedGarbage` in 2 different modes : the current full one, and a 
new `onPath` one. Eg that would imply refactoring the existing 
`collectDetailedGarbage`'s inner part out and then more or less directly 
invoking that on a particular path (perhaps have a small facade method that 
facilitates that).
   
   The problem I think is indeed, as @rishabhdaim  [already 
mentioned](https://github.com/apache/jackrabbit-oak/pull/1328/files#r1511381906),
 that some parts might be missed, if we have a second variant of 
`removeGarbage` (for example we wouldn't do the embedded verification). Of 
course doing it "the real way" does require some refactoring, but I think it 
could be done via that inner loop mentioned. Wdyt?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to