stefan-egli commented on code in PR #1328: URL: https://github.com/apache/jackrabbit-oak/pull/1328#discussion_r1512625559
########## oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/VersionGarbageCollector.java: ########## @@ -1238,12 +1240,11 @@ private void removeUnmergedBCRevision(final Revision unmergedBCRevision, final N } private void collectOldRevisions(final NodeDocument doc, final GCPhases phases, final UpdateOp updateOp) { Review Comment: Would suggest to either rename to clarify that this is about old and between-checkpoint-but-not-younger-than-maxRevisionAgeMillis revisions - or to clarify that in a javadoc. Now, given the old-revision-gc part competes with OAK-10688 we should execute only one of the two, not both. One idea could be to add an static boolean (not a system property, just a plain boolean) that lets you run DetailedGC in either between-checkpoint mode (OAK-10535) or keep-1-revision mode (OAK-10688). Then tests can control that flag at their discretion, we can compare the two and find bugs in the two, but ultimately decide on which one we want. ########## oak-run/src/main/java/org/apache/jackrabbit/oak/run/RevisionsCommand.java: ########## @@ -74,15 +80,18 @@ public class RevisionsCommand implements Command { private static final Logger LOG = LoggerFactory.getLogger(RevisionsCommand.class); + private static final int REVISION_CAP = getInteger("oak.revision.cap", 250); Review Comment: (where) Is this still used? ########## oak-run/src/main/java/org/apache/jackrabbit/oak/run/RevisionsCommand.java: ########## @@ -460,6 +479,55 @@ private void sweep(RevisionsOptions options, Closer closer) SweepHelper.sweep(store, new RevisionContextWrapper(ns, clusterId), seeker); } + private void pathCleanup(RevisionsOptions options, Closer closer) throws IOException { + String path = options.getPath(); + if (path == null || path.isEmpty()) { + System.err.println("path option is required for " + RevisionsOptions.CMD_CLEANUP + " command"); + return; + } + + DocumentNodeStoreBuilder<?> builder = createDocumentMKBuilder(options, closer); + if (builder == null) { + System.err.println("revisions mode only available for DocumentNodeStore"); + return; + } + + DocumentStore documentStore = builder.getDocumentStore(); + builder.setReadOnlyMode(); + useMemoryBlobStore(builder); + DocumentNodeStore documentNodeStore = builder.build(); + String id = org.apache.jackrabbit.oak.plugins.document.util.Utils.getIdFromPath(path); + NodeDocument workingDocument = documentStore.find(NODES, id); + //NodeDocumentRevisionCleaner revisionCleaner = new NodeDocumentRevisionCleaner(documentNodeStore, workingDocument); Review Comment: Is the idea to keep this as a reminder for a future change (or could it be removed now)? ########## oak-run/src/main/java/org/apache/jackrabbit/oak/run/RevisionsCommand.java: ########## @@ -74,15 +89,18 @@ public class RevisionsCommand implements Command { private static final Logger LOG = LoggerFactory.getLogger(RevisionsCommand.class); + private static final int REVISION_CAP = getInteger("oak.revision.cap", 250); + private static final String USAGE = Joiner.on(System.lineSeparator()).join( "revisions {<jdbc-uri> | <mongodb-uri>} <sub-command> [options]", "where sub-command is one of", " info give information about the revisions state without performing", " any modifications", - " collect perform garbage collection", - " reset clear all persisted metadata", - " sweep clean up uncommitted changes", - " detailedGC perform detailed garbage collection i.e. remove unmerged branch commits, old revisions, deleted properties etc" + " collect perform garbage collection", + " reset clear all persisted metadata", + " sweep clean up uncommitted changes", + " detailedGC perform detailed garbage collection i.e. remove unmerged branch commits, old revisions, deleted properties etc", + " pathCleanup clean up old/unused revisions and unmerged branch commits on a specific path" Review Comment: alternatively I was wondering if the restriction to a single path could not also be a further command line parameter of `detailedGC` mode? (in which case we wouldn't need a new mode just make `detailedGC` more fancy) ########## oak-run/src/main/java/org/apache/jackrabbit/oak/run/RevisionsCommand.java: ########## @@ -74,15 +80,18 @@ public class RevisionsCommand implements Command { private static final Logger LOG = LoggerFactory.getLogger(RevisionsCommand.class); + private static final int REVISION_CAP = getInteger("oak.revision.cap", 250); + private static final String USAGE = Joiner.on(System.lineSeparator()).join( "revisions {<jdbc-uri> | <mongodb-uri>} <sub-command> [options]", "where sub-command is one of", " info give information about the revisions state without performing", Review Comment: ```suggestion " info give information about the revisions state without performing", ``` (minor: indentation) ########## oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/VersionGarbageCollector.java: ########## @@ -1398,6 +1399,24 @@ private void delayOnModifications(final long durationMs, final AtomicBoolean can } } + public UpdateOp collectGarbageOnDocument(DocumentNodeStore store, NodeDocument doc) { Review Comment: As an alternative idea to the 2 steps suggested (`collectGarbageOnDocument` then removing the garbage in `RevisionsCommand`) which @rishabhdaim also commented on : what about focusing on executing the [core part](https://github.com/apache/jackrabbit-oak/blob/cf870b532b107b0c243e1d0180f1213d3f6e9197/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/VersionGarbageCollector.java#L736-L782) of `collectDetailedGarbage` in 2 different modes : the current full one, and a new `onPath` one. Eg that would imply refactoring the existing `collectDetailedGarbage`'s inner part out and then more or less directly invoking that on a particular path (perhaps have a small facade method that facilitates that). The problem I think is indeed, as @rishabhdaim [already mentioned](https://github.com/apache/jackrabbit-oak/pull/1328/files#r1511381906), that some parts might be missed, if we have a second variant of `removeGarbage` (for example we wouldn't do the embedded verification). Of course doing it "the real way" does require some refactoring, but I think it could be done via that inner loop mentioned. Wdyt? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@jackrabbit.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org