[jira] [Commented] (OAK-10843) Flaky fullgc tests
[ https://issues.apache.org/jira/browse/OAK-10843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850688#comment-17850688 ] Stefan Egli commented on OAK-10843: --- PR for that flaky run -> https://github.com/apache/jackrabbit-oak/pull/1497 > Flaky fullgc tests > -- > > Key: OAK-10843 > URL: https://issues.apache.org/jira/browse/OAK-10843 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Priority: Major > > As noted in OAK-10739 there is potential flakyness in tests with fullgc modes > "BETWEEN_CHECKPOINTS" and others. This ticket is to look into fixing it. > First measure could be to disable them until we have a fix. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OAK-10843) Flaky fullgc tests
[ https://issues.apache.org/jira/browse/OAK-10843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850687#comment-17850687 ] Stefan Egli commented on OAK-10843: --- [~reschke], I'm going to exclude the following flaky run : {noformat} [ERROR] testBundledPropUnmergedBCGC[7: MongoFixture: MongoDB with ORPHANS_EMPTYPROPS_KEEP_ONE_ALL_PROPS](org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollectorIT) Time elapsed: 0.578 s <<< FAILURE! java.lang.AssertionError: ORPHANS_EMPTYPROPS_KEEP_ONE_ALL_PROPS/internalProps expected:<2> but was:<1> at org.junit.Assert.fail(Assert.java:89) at org.junit.Assert.failNotEquals(Assert.java:835) at org.junit.Assert.assertEquals(Assert.java:647) at org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollectorIT.assertStatsCountsEqual(VersionGarbageCollectorIT.java:1227) at org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollectorIT.testBundledPropUnmergedBCGC(VersionGarbageCollectorIT.java:1943) {noformat} > Flaky fullgc tests > -- > > Key: OAK-10843 > URL: https://issues.apache.org/jira/browse/OAK-10843 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Priority: Major > > As noted in OAK-10739 there is potential flakyness in tests with fullgc modes > "BETWEEN_CHECKPOINTS" and others. This ticket is to look into fixing it. > First measure could be to disable them until we have a fix. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OAK-10844) speed up fullgc tests
[ https://issues.apache.org/jira/browse/OAK-10844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850616#comment-17850616 ] Stefan Egli commented on OAK-10844: --- Initial PR to (1) reduce nr of test combinations and (2) add logging to help narrowing down where time is spent : https://github.com/apache/jackrabbit-oak/pull/1494 > speed up fullgc tests > - > > Key: OAK-10844 > URL: https://issues.apache.org/jira/browse/OAK-10844 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Priority: Major > > The new fullgc tests are unacceptably slow on some environments (in the range > of 2h). It might have to do with mongo in docker - but it's also multiplied > with many permutations of fullgcmode and fixtures that are run. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (OAK-10844) speed up fullgc tests
Stefan Egli created OAK-10844: - Summary: speed up fullgc tests Key: OAK-10844 URL: https://issues.apache.org/jira/browse/OAK-10844 Project: Jackrabbit Oak Issue Type: Task Components: documentmk Reporter: Stefan Egli The new fullgc tests are unacceptably slow on some environments (in the range of 2h). It might have to do with mongo in docker - but it's also multiplied with many permutations of fullgcmode and fixtures that are run. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10844) speed up fullgc tests
[ https://issues.apache.org/jira/browse/OAK-10844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10844: -- Epic Link: OAK-10739 > speed up fullgc tests > - > > Key: OAK-10844 > URL: https://issues.apache.org/jira/browse/OAK-10844 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Priority: Major > > The new fullgc tests are unacceptably slow on some environments (in the range > of 2h). It might have to do with mongo in docker - but it's also multiplied > with many permutations of fullgcmode and fixtures that are run. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10843) Flaky fullgc tests
[ https://issues.apache.org/jira/browse/OAK-10843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10843: -- Epic Link: OAK-10739 > Flaky fullgc tests > -- > > Key: OAK-10843 > URL: https://issues.apache.org/jira/browse/OAK-10843 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Priority: Major > > As noted in OAK-10739 there is potential flakyness in tests with fullgc modes > "BETWEEN_CHECKPOINTS" and others. This ticket is to look into fixing it. > First measure could be to disable them until we have a fix. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10843) Flaky fullgc tests
[ https://issues.apache.org/jira/browse/OAK-10843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10843: -- Description: As noted in OAK-10739 there is potential flakyness in tests with fullgc modes "BETWEEN_CHECKPOINTS" and others. This ticket is to look into fixing it. First measure could be to disable them until we have a fix. (was: As noted in OAK-10739 there is potential flakyness in tests with fullgc modes "BETWEEN_CHECKPOINTS". This ticket is to look into fixing it. First measure could be to disable them until we have a fix.) > Flaky fullgc tests > -- > > Key: OAK-10843 > URL: https://issues.apache.org/jira/browse/OAK-10843 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Priority: Major > > As noted in OAK-10739 there is potential flakyness in tests with fullgc modes > "BETWEEN_CHECKPOINTS" and others. This ticket is to look into fixing it. > First measure could be to disable them until we have a fix. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10843) Flaky fullgc tests
[ https://issues.apache.org/jira/browse/OAK-10843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10843: -- Summary: Flaky fullgc tests (was: Flaky between-cp tests) > Flaky fullgc tests > -- > > Key: OAK-10843 > URL: https://issues.apache.org/jira/browse/OAK-10843 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Priority: Major > > As noted in OAK-10739 there is potential flakyness in tests with fullgc modes > "BETWEEN_CHECKPOINTS". This ticket is to look into fixing it. First measure > could be to disable them until we have a fix. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OAK-10739) Provide Support for Full Garbage Collection in Mongo Document Store
[ https://issues.apache.org/jira/browse/OAK-10739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850350#comment-17850350 ] Stefan Egli commented on OAK-10739: --- One suggestion is to disable the flaky tests - eg done for VersionGarbageCollectorIT in OAK-10843 > Provide Support for Full Garbage Collection in Mongo Document Store > --- > > Key: OAK-10739 > URL: https://issues.apache.org/jira/browse/OAK-10739 > Project: Jackrabbit Oak > Issue Type: Epic >Reporter: Rishabh Daim >Assignee: Rishabh Daim >Priority: Major > Labels: DetailedGC > Attachments: > org.apache.jackrabbit.oak.plugins.document.BranchCommitGCTest.txt, > org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollectorIT.txt > > > We need to provide the support to collect & remove the full garbage for > DocumentNodeStore. > At the time of creating this epic garbage includes orphaned nodes, deleted > properties, unmerged branch commits, and old revisions. > > This list can be updated in case a new type of garbage is found. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OAK-10843) Flaky between-cp tests
[ https://issues.apache.org/jira/browse/OAK-10843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850348#comment-17850348 ] Stefan Egli commented on OAK-10843: --- Was meant to create a PR first.but it got straight to trunk - https://github.com/apache/jackrabbit-oak/commit/e488628c018a08ad40c8f36281a076b1ef2dd9a6 > Flaky between-cp tests > -- > > Key: OAK-10843 > URL: https://issues.apache.org/jira/browse/OAK-10843 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Priority: Major > > As noted in OAK-10739 there is potential flakyness in tests with fullgc modes > "BETWEEN_CHECKPOINTS". This ticket is to look into fixing it. First measure > could be to disable them until we have a fix. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (OAK-10843) Flaky between-cp tests
Stefan Egli created OAK-10843: - Summary: Flaky between-cp tests Key: OAK-10843 URL: https://issues.apache.org/jira/browse/OAK-10843 Project: Jackrabbit Oak Issue Type: Task Components: documentmk Reporter: Stefan Egli As noted in OAK-10739 there is potential flakyness in tests with fullgc modes "BETWEEN_CHECKPOINTS". This ticket is to look into fixing it. First measure could be to disable them until we have a fix. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OAK-10739) Provide Support for Full Garbage Collection in Mongo Document Store
[ https://issues.apache.org/jira/browse/OAK-10739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846348#comment-17846348 ] Stefan Egli commented on OAK-10739: --- Yes it is for now only implemented for MongoDB. The largest part is outside of DocumentStore though, so adding RDB support is a relatively small effort (compared to the actual feature) > Provide Support for Full Garbage Collection in Mongo Document Store > --- > > Key: OAK-10739 > URL: https://issues.apache.org/jira/browse/OAK-10739 > Project: Jackrabbit Oak > Issue Type: Epic >Reporter: Rishabh Daim >Assignee: Rishabh Daim >Priority: Major > Labels: DetailedGC > > We need to provide the support to collect & remove the full garbage for > DocumentNodeStore. > At the time of creating this epic garbage includes orphaned nodes, deleted > properties, unmerged branch commits, and old revisions. > > This list can be updated in case a new type of garbage is found. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OAK-10792) Rename DetailedGC to FullGC
[ https://issues.apache.org/jira/browse/OAK-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846330#comment-17846330 ] Stefan Egli commented on OAK-10792: --- Also added https://github.com/apache/jackrabbit-oak/pull/1458 > Rename DetailedGC to FullGC > --- > > Key: OAK-10792 > URL: https://issues.apache.org/jira/browse/OAK-10792 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core >Reporter: Daniel Iancu >Assignee: Rishabh Daim >Priority: Minor > Labels: DetailedGC > > Switching to FullGC instead of DetailedGC everywhere, method names, > constants, arguments etc -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OAK-10739) Provide Support for Full Garbage Collection in Mongo Document Store
[ https://issues.apache.org/jira/browse/OAK-10739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846265#comment-17846265 ] Stefan Egli commented on OAK-10739: --- The PR for merging the main feature branch to trunk has just been created at https://github.com/apache/jackrabbit-oak/pull/1454 > Provide Support for Full Garbage Collection in Mongo Document Store > --- > > Key: OAK-10739 > URL: https://issues.apache.org/jira/browse/OAK-10739 > Project: Jackrabbit Oak > Issue Type: Epic >Reporter: Rishabh Daim >Assignee: Rishabh Daim >Priority: Major > Labels: DetailedGC > > We need to provide the support to collect & remove the full garbage for > DocumentNodeStore. > At the time of creating this epic garbage includes orphaned nodes, deleted > properties, unmerged branch commits, and old revisions. > > This list can be updated in case a new type of garbage is found. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10792) Rename DetailedGC to FullGC
[ https://issues.apache.org/jira/browse/OAK-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10792: -- Labels: DetailedGC (was: ) > Rename DetailedGC to FullGC > --- > > Key: OAK-10792 > URL: https://issues.apache.org/jira/browse/OAK-10792 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core >Reporter: Daniel Iancu >Priority: Minor > Labels: DetailedGC > > Switching to FullGC instead of DetailedGC everywhere, method names, > constants, arguments etc -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10792) Rename DetailedGC to FullGC
[ https://issues.apache.org/jira/browse/OAK-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10792: -- Epic Link: OAK-10739 > Rename DetailedGC to FullGC > --- > > Key: OAK-10792 > URL: https://issues.apache.org/jira/browse/OAK-10792 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core >Reporter: Daniel Iancu >Priority: Minor > > Switching to FullGC instead of DetailedGC everywhere, method names, > constants, arguments etc -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (OAK-10764) gap orphans improvement : only lookup greatest existing ancestor, then cache
[ https://issues.apache.org/jira/browse/OAK-10764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli resolved OAK-10764. --- Resolution: Done PR got merged > gap orphans improvement : only lookup greatest existing ancestor, then cache > > > Key: OAK-10764 > URL: https://issues.apache.org/jira/browse/OAK-10764 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > > Follow-up of OAK-10761 : to further improve gap vs non-gap orphan type > detection, lookup only the direct child of the greatest existing ancestor, > then cache that result. Upon further type detection, use that cache to start > with. That eliminates the tree traversal, plus using the cache should further > limit the number of lookups in the first place. The cache can be small (eg > 64) and assuming paths aren't excessive (say max 10k) the resulting memory > usage would still be small (eg 64k characters). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OAK-10763) DetailedGC is skipping deletion of deleted props for orphan nodes
[ https://issues.apache.org/jira/browse/OAK-10763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837635#comment-17837635 ] Stefan Egli commented on OAK-10763: --- PS: merged https://github.com/apache/jackrabbit-oak/pull/1421 > DetailedGC is skipping deletion of deleted props for orphan nodes > - > > Key: OAK-10763 > URL: https://issues.apache.org/jira/browse/OAK-10763 > Project: Jackrabbit Oak > Issue Type: Bug >Reporter: Rishabh Daim >Assignee: Rishabh Daim >Priority: Major > > DetailedGC is skipping the deletion of deleted props for orphan nodes due to > the absence of ancestors in some cases. > In case there is a gap in ancestors then we should pass the verification and > proceed with deletion of deleted props. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OAK-10763) DetailedGC is skipping deletion of deleted props for orphan nodes
[ https://issues.apache.org/jira/browse/OAK-10763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837384#comment-17837384 ] Stefan Egli commented on OAK-10763: --- Created a PR for an alternative suggestion in https://github.com/apache/jackrabbit-oak/pull/1421 > DetailedGC is skipping deletion of deleted props for orphan nodes > - > > Key: OAK-10763 > URL: https://issues.apache.org/jira/browse/OAK-10763 > Project: Jackrabbit Oak > Issue Type: Bug >Reporter: Rishabh Daim >Assignee: Rishabh Daim >Priority: Major > > DetailedGC is skipping the deletion of deleted props for orphan nodes due to > the absence of ancestors in some cases. > In case there is a gap in ancestors then we should pass the verification and > proceed with deletion of deleted props. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OAK-10764) gap orphans improvement : only lookup greatest existing ancestor, then cache
[ https://issues.apache.org/jira/browse/OAK-10764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837383#comment-17837383 ] Stefan Egli commented on OAK-10764: --- note : created PR on top of the previous one : https://github.com/apache/jackrabbit-oak/pull/1421 > gap orphans improvement : only lookup greatest existing ancestor, then cache > > > Key: OAK-10764 > URL: https://issues.apache.org/jira/browse/OAK-10764 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > > Follow-up of OAK-10761 : to further improve gap vs non-gap orphan type > detection, lookup only the direct child of the greatest existing ancestor, > then cache that result. Upon further type detection, use that cache to start > with. That eliminates the tree traversal, plus using the cache should further > limit the number of lookups in the first place. The cache can be small (eg > 64) and assuming paths aren't excessive (say max 10k) the resulting memory > usage would still be small (eg 64k characters). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OAK-10764) gap orphans improvement : only lookup greatest existing ancestor, then cache
[ https://issues.apache.org/jira/browse/OAK-10764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837381#comment-17837381 ] Stefan Egli commented on OAK-10764: --- PR created at https://github.com/apache/jackrabbit-oak/pull/1420 > gap orphans improvement : only lookup greatest existing ancestor, then cache > > > Key: OAK-10764 > URL: https://issues.apache.org/jira/browse/OAK-10764 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > > Follow-up of OAK-10761 : to further improve gap vs non-gap orphan type > detection, lookup only the direct child of the greatest existing ancestor, > then cache that result. Upon further type detection, use that cache to start > with. That eliminates the tree traversal, plus using the cache should further > limit the number of lookups in the first place. The cache can be small (eg > 64) and assuming paths aren't excessive (say max 10k) the resulting memory > usage would still be small (eg 64k characters). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (OAK-10764) gap orphans improvement : only lookup greatest existing ancestor, then cache
Stefan Egli created OAK-10764: - Summary: gap orphans improvement : only lookup greatest existing ancestor, then cache Key: OAK-10764 URL: https://issues.apache.org/jira/browse/OAK-10764 Project: Jackrabbit Oak Issue Type: Task Components: documentmk Reporter: Stefan Egli Assignee: Stefan Egli Follow-up of OAK-10761 : to further improve gap vs non-gap orphan type detection, lookup only the direct child of the greatest existing ancestor, then cache that result. Upon further type detection, use that cache to start with. That eliminates the tree traversal, plus using the cache should further limit the number of lookups in the first place. The cache can be small (eg 64) and assuming paths aren't excessive (say max 10k) the resulting memory usage would still be small (eg 64k characters). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (OAK-10761) gap orphans improvement : ignore greatest existing ancestors
[ https://issues.apache.org/jira/browse/OAK-10761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli resolved OAK-10761. --- Resolution: Done PR merged > gap orphans improvement : ignore greatest existing ancestors > > > Key: OAK-10761 > URL: https://issues.apache.org/jira/browse/OAK-10761 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > > The gap orphan mode introduced with OAK-10743 starts its gap test at root - > while it could easily skip the first few elements that it knows do exist (as > so determined by the original state traversal). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OAK-10761) gap orphans improvement : ignore greatest existing ancestors
[ https://issues.apache.org/jira/browse/OAK-10761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837270#comment-17837270 ] Stefan Egli commented on OAK-10761: --- PR created at https://github.com/apache/jackrabbit-oak/pull/1414 > gap orphans improvement : ignore greatest existing ancestors > > > Key: OAK-10761 > URL: https://issues.apache.org/jira/browse/OAK-10761 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > > The gap orphan mode introduced with OAK-10743 starts its gap test at root - > while it could easily skip the first few elements that it knows do exist (as > so determined by the original state traversal). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (OAK-10761) gap orphans improvement : ignore greatest existing ancestors
Stefan Egli created OAK-10761: - Summary: gap orphans improvement : ignore greatest existing ancestors Key: OAK-10761 URL: https://issues.apache.org/jira/browse/OAK-10761 Project: Jackrabbit Oak Issue Type: Task Components: documentmk Reporter: Stefan Egli Assignee: Stefan Egli The gap orphan mode introduced with OAK-10743 starts its gap test at root - while it could easily skip the first few elements that it knows do exist (as so determined by the original state traversal). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (OAK-10743) Split orphaned gc mode into two : with-gap, without-gap
[ https://issues.apache.org/jira/browse/OAK-10743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli resolved OAK-10743. --- Resolution: Done PR merged, marking done > Split orphaned gc mode into two : with-gap, without-gap > --- > > Key: OAK-10743 > URL: https://issues.apache.org/jira/browse/OAK-10743 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > > As a hardening mitigation option we should split the orphaned node deletion > into two different ones: > * with-gap : an orphan that is orphan because one of its parents *document* > (not only node) doesn't exist > * without-gap : an orphan that is orphan because or late-write -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OAK-10743) Split orphaned gc mode into two : with-gap, without-gap
[ https://issues.apache.org/jira/browse/OAK-10743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834014#comment-17834014 ] Stefan Egli commented on OAK-10743: --- PR created : https://github.com/apache/jackrabbit-oak/pull/1404 > Split orphaned gc mode into two : with-gap, without-gap > --- > > Key: OAK-10743 > URL: https://issues.apache.org/jira/browse/OAK-10743 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > > As a hardening mitigation option we should split the orphaned node deletion > into two different ones: > * with-gap : an orphan that is orphan because one of its parents *document* > (not only node) doesn't exist > * without-gap : an orphan that is orphan because or late-write -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (OAK-10743) Split orphaned gc mode into two : with-gap, without-gap
Stefan Egli created OAK-10743: - Summary: Split orphaned gc mode into two : with-gap, without-gap Key: OAK-10743 URL: https://issues.apache.org/jira/browse/OAK-10743 Project: Jackrabbit Oak Issue Type: Task Components: documentmk Reporter: Stefan Egli Assignee: Stefan Egli As a hardening mitigation option we should split the orphaned node deletion into two different ones: * with-gap : an orphan that is orphan because one of its parents *document* (not only node) doesn't exist * without-gap : an orphan that is orphan because or late-write -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10742) Introduce include/exclude lists for detailedGC
[ https://issues.apache.org/jira/browse/OAK-10742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10742: -- Labels: DetailedGC (was: ) > Introduce include/exclude lists for detailedGC > -- > > Key: OAK-10742 > URL: https://issues.apache.org/jira/browse/OAK-10742 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > > We should have config options that define: > * an exclude list of wildcard based paths that are ignored for detailed gc in > general. one example for this could be to not gc > /jcr:system/jcr:versionStorage at all. This option should apply to all > different detailedGC modes > * a temporary (deprecated) option for an include list of wildcard based paths > that should only be considered for gc. This option should apply to > deleteEmptyProperties (initially). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (OAK-10742) Introduce include/exclude lists for detailedGC
Stefan Egli created OAK-10742: - Summary: Introduce include/exclude lists for detailedGC Key: OAK-10742 URL: https://issues.apache.org/jira/browse/OAK-10742 Project: Jackrabbit Oak Issue Type: Task Components: documentmk Reporter: Stefan Egli Assignee: Stefan Egli We should have config options that define: * an exclude list of wildcard based paths that are ignored for detailed gc in general. one example for this could be to not gc /jcr:system/jcr:versionStorage at all. This option should apply to all different detailedGC modes * a temporary (deprecated) option for an include list of wildcard based paths that should only be considered for gc. This option should apply to deleteEmptyProperties (initially). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10739) Provide Support for Detailed Garbage Collection in Document Node Store
[ https://issues.apache.org/jira/browse/OAK-10739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10739: -- Labels: DetailedGC (was: ) > Provide Support for Detailed Garbage Collection in Document Node Store > -- > > Key: OAK-10739 > URL: https://issues.apache.org/jira/browse/OAK-10739 > Project: Jackrabbit Oak > Issue Type: Epic >Reporter: Rishabh Daim >Assignee: Rishabh Daim >Priority: Major > Labels: DetailedGC > > We need to provide the support to collect & remove the full garbage for > DocumentNodeStore. > At the time of creating this epic garbage includes orphaned nodes, deleted > properties, unmerged branch commits, and old revisions. > > This list can be updated in case a new type of garbage is found. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (OAK-8646) Clean up changes from orphaned branch commits
[ https://issues.apache.org/jira/browse/OAK-8646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli resolved OAK-8646. -- Resolution: Done +1, marking done then > Clean up changes from orphaned branch commits > - > > Key: OAK-8646 > URL: https://issues.apache.org/jira/browse/OAK-8646 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: documentmk >Reporter: Marcel Reutegger >Assignee: Rishabh Daim >Priority: Major > Labels: DetailedGC > > The Revision Garbage Collector currently does not clean up changes from > orphaned branch commits. Those are branch commits that have not been merged > but are still present on documents in the DocumentStore. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OAK-8646) Clean up changes from orphaned branch commits
[ https://issues.apache.org/jira/browse/OAK-8646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833623#comment-17833623 ] Stefan Egli commented on OAK-8646: -- [~reschke], as we're using a feature branch I would suggest we can skip the fix version for these? > Clean up changes from orphaned branch commits > - > > Key: OAK-8646 > URL: https://issues.apache.org/jira/browse/OAK-8646 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: documentmk >Reporter: Marcel Reutegger >Assignee: Rishabh Daim >Priority: Major > Labels: DetailedGC > > The Revision Garbage Collector currently does not clean up changes from > orphaned branch commits. Those are branch commits that have not been merged > but are still present on documents in the DocumentStore. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10193) Garbage collect deleted properties
[ https://issues.apache.org/jira/browse/OAK-10193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10193: -- Component/s: documentmk > Garbage collect deleted properties > -- > > Key: OAK-10193 > URL: https://issues.apache.org/jira/browse/OAK-10193 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: documentmk >Reporter: Rishabh Daim >Assignee: Rishabh Daim >Priority: Major > Labels: DetailedGC > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10378) Add metrics for detailed GC
[ https://issues.apache.org/jira/browse/OAK-10378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10378: -- Component/s: documentmk > Add metrics for detailed GC > --- > > Key: OAK-10378 > URL: https://issues.apache.org/jira/browse/OAK-10378 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: documentmk >Reporter: Rishabh Daim >Assignee: Rishabh Daim >Priority: Major > Labels: DetailedGC > > We need to provide the support to collect metrics for all the > deletion/updation done as part of detailedGC cycles. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10676) Consider late-writes while removing deleted properties during detailedGC
[ https://issues.apache.org/jira/browse/OAK-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10676: -- Component/s: documentmk > Consider late-writes while removing deleted properties during detailedGC > > > Key: OAK-10676 > URL: https://issues.apache.org/jira/browse/OAK-10676 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: documentmk >Reporter: Rishabh Daim >Assignee: Rishabh Daim >Priority: Major > Labels: DetailedGC > > We need to take into account the late-writes or inconsistent revisions while > removing deleted properties. > > For e.g. In case the property is null in latest revision but that revision is > itself not valid/committed/broken, we might need to skip removal of such > properties. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10382) oak-run support for flatfile
[ https://issues.apache.org/jira/browse/OAK-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10382: -- Component/s: documentmk > oak-run support for flatfile > > > Key: OAK-10382 > URL: https://issues.apache.org/jira/browse/OAK-10382 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk, oak-run >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > Fix For: 1.60.0 > > > As a follow-up of OAK-10347 we need a wrapper of the SimpleFlatFileUtil - > plus (potentially) a full-gc command which runs a full round of detail gc (in > DocumentNodeStore that is) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10710) Reset detailedGC settings after running the detailedGC cycle
[ https://issues.apache.org/jira/browse/OAK-10710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10710: -- Component/s: oak-run > Reset detailedGC settings after running the detailedGC cycle > > > Key: OAK-10710 > URL: https://issues.apache.org/jira/browse/OAK-10710 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: oak-run >Reporter: Rishabh Daim >Assignee: Rishabh Daim >Priority: Major > Labels: DetailedGC > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10736) Collect DetailedGC Stats for DryRun mode
[ https://issues.apache.org/jira/browse/OAK-10736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10736: -- Component/s: documentmk > Collect DetailedGC Stats for DryRun mode > > > Key: OAK-10736 > URL: https://issues.apache.org/jira/browse/OAK-10736 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Rishabh Daim >Assignee: Rishabh Daim >Priority: Major > Labels: DetailedGC > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10676) Consider late-writes while removing deleted properties during detailedGC
[ https://issues.apache.org/jira/browse/OAK-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10676: -- Labels: DetailedGC (was: ) > Consider late-writes while removing deleted properties during detailedGC > > > Key: OAK-10676 > URL: https://issues.apache.org/jira/browse/OAK-10676 > Project: Jackrabbit Oak > Issue Type: New Feature >Reporter: Rishabh Daim >Assignee: Rishabh Daim >Priority: Major > Labels: DetailedGC > > We need to take into account the late-writes or inconsistent revisions while > removing deleted properties. > > For e.g. In case the property is null in latest revision but that revision is > itself not valid/committed/broken, we might need to skip removal of such > properties. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10370) Dry-run mode for full GC
[ https://issues.apache.org/jira/browse/OAK-10370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10370: -- Labels: DetailedGC (was: ) > Dry-run mode for full GC > > > Key: OAK-10370 > URL: https://issues.apache.org/jira/browse/OAK-10370 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: documentmk >Reporter: Ankita Agarwal >Assignee: Rishabh Daim >Priority: Major > Labels: DetailedGC > > For detailed GC OAK-10199, a dry-run mode is required where nothing will be > deleted, only listed like orphaned branch commits and deleted properties, etc. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10632) Make Embedded DetailedGC Configurable for dryRun mode
[ https://issues.apache.org/jira/browse/OAK-10632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10632: -- Labels: DetailedGC (was: ) > Make Embedded DetailedGC Configurable for dryRun mode > - > > Key: OAK-10632 > URL: https://issues.apache.org/jira/browse/OAK-10632 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: documentmk >Reporter: Rishabh Daim >Assignee: Rishabh Daim >Priority: Major > Labels: DetailedGC > > We have introduced embedded verification of detailedGC in both normal & > dryRun mode. > We need to make embedded verification configurable in dryRun mode. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10193) Garbage collect deleted properties
[ https://issues.apache.org/jira/browse/OAK-10193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10193: -- Labels: DetailedGC (was: ) > Garbage collect deleted properties > -- > > Key: OAK-10193 > URL: https://issues.apache.org/jira/browse/OAK-10193 > Project: Jackrabbit Oak > Issue Type: Improvement >Reporter: Rishabh Daim >Assignee: Rishabh Daim >Priority: Major > Labels: DetailedGC > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-8646) Clean up changes from orphaned branch commits
[ https://issues.apache.org/jira/browse/OAK-8646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-8646: - Labels: DetailedGC (was: ) > Clean up changes from orphaned branch commits > - > > Key: OAK-8646 > URL: https://issues.apache.org/jira/browse/OAK-8646 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: documentmk >Reporter: Marcel Reutegger >Assignee: Rishabh Daim >Priority: Major > Labels: DetailedGC > > The Revision Garbage Collector currently does not clean up changes from > orphaned branch commits. Those are branch commits that have not been merged > but are still present on documents in the DocumentStore. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10199) Skeleton of an additional, extendable "detail" garbage collector based on only "_modified"
[ https://issues.apache.org/jira/browse/OAK-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10199: -- Labels: DetailedGC (was: ) > Skeleton of an additional, extendable "detail" garbage collector based on > only "_modified" > -- > > Key: OAK-10199 > URL: https://issues.apache.org/jira/browse/OAK-10199 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: documentmk >Reporter: Stefan Egli >Assignee: Rishabh Daim >Priority: Major > Labels: DetailedGC > > DocumentNodeStore's revision garbage collector currently doesn't clean up > 100% of garbage. Several of those gaps have so far been identified, including: > * OAK-8646 : "Clean up changes from orphaned branch commits" > * OAK-10193 : "Garbage collect deleted properties" > The common aspect of the above is the fact that cleaning up that garbage on > an existing repository will mean to do a full scan of the entire repository, > to find and delete such garbage. > The current working title for this is "detail gc" > The ticket here is about creating a skeleton of a garbage collector that the > above, individual garbage types can then "hook into". > There are two parts of the cleanup: > * an initial, full repository scan > * an iterative, continuous scan (eg after the above full scan has completed) > The full repository scan is optional - one could decide to leave the garbage > and not worry about it (but enable the continuous scan and thus clean up > documents that are changed in the future lazily). > While the two parts could in theory be based on a different query, it _can_ > also be done on the same query. > One suggested query is to go through all documents where "_modified" is > between the previous gc run and an increment, but older than the > 'versionGcMaxAgeInSecs' (24h by default) - plus eg taking checkpoints into > account. > A full repository scan is then characterized by setting this "previous gc > run" pointer to zero. > In particular for the full repository scan it is necessary for the gc to run > in reasonably small batches - and apply a voluntary throttle, to avoid system > overload. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10378) Add metrics for detailed GC
[ https://issues.apache.org/jira/browse/OAK-10378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10378: -- Labels: DetailedGC (was: ) > Add metrics for detailed GC > --- > > Key: OAK-10378 > URL: https://issues.apache.org/jira/browse/OAK-10378 > Project: Jackrabbit Oak > Issue Type: New Feature >Reporter: Rishabh Daim >Assignee: Rishabh Daim >Priority: Major > Labels: DetailedGC > > We need to provide the support to collect metrics for all the > deletion/updation done as part of detailedGC cycles. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10633) Make Embedded DetailedGC Configurable in detailedGC
[ https://issues.apache.org/jira/browse/OAK-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10633: -- Labels: DetailedGC (was: ) > Make Embedded DetailedGC Configurable in detailedGC > --- > > Key: OAK-10633 > URL: https://issues.apache.org/jira/browse/OAK-10633 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: documentmk >Reporter: Rishabh Daim >Assignee: Rishabh Daim >Priority: Major > Labels: DetailedGC > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10535) Clean up old revisions in a document
[ https://issues.apache.org/jira/browse/OAK-10535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10535: -- Labels: DetailedGC (was: ) > Clean up old revisions in a document > > > Key: OAK-10535 > URL: https://issues.apache.org/jira/browse/OAK-10535 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: documentmk >Reporter: José Andrés Cordero Benítez >Assignee: José Andrés Cordero Benítez >Priority: Minor > Labels: DetailedGC > > Introduce a way to safely detect and delete old revisions in a document. This > could be useful to cleanup documents that sometimes grows above the supported > size in MongoDB (16MB). > It could be also integrate into the detailed GC. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10659) Remove orphaned nodes/documents
[ https://issues.apache.org/jira/browse/OAK-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10659: -- Labels: DetailedGC (was: ) > Remove orphaned nodes/documents > --- > > Key: OAK-10659 > URL: https://issues.apache.org/jira/browse/OAK-10659 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > Fix For: 1.62.0 > > > As part of DetailedGC (also see OAK-10199) we also need to clean up documents > that (for some reason) have become orphaned. Orphaned nodes are nodes without > a parent, i.e. they fulfill two criterias: > * they cannot be traversed to - the traversed state would be null / > non-existant > * but reading the node via getNodeAtRevision would properly resolve in an > existing node -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10689) Extend oak-run revisions command with "detail" garbage collection
[ https://issues.apache.org/jira/browse/OAK-10689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10689: -- Labels: DetailedGC (was: ) > Extend oak-run revisions command with "detail" garbage collection > - > > Key: OAK-10689 > URL: https://issues.apache.org/jira/browse/OAK-10689 > Project: Jackrabbit Oak > Issue Type: Task > Components: oak-run >Reporter: José Andrés Cordero Benítez >Assignee: José Andrés Cordero Benítez >Priority: Minor > Labels: DetailedGC > > Extend the oak-run revisions command to perform a detailed cleanup on a given > document. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10710) Reset detailedGC settings after running the detailedGC cycle
[ https://issues.apache.org/jira/browse/OAK-10710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10710: -- Labels: DetailedGC (was: ) > Reset detailedGC settings after running the detailedGC cycle > > > Key: OAK-10710 > URL: https://issues.apache.org/jira/browse/OAK-10710 > Project: Jackrabbit Oak > Issue Type: New Feature >Reporter: Rishabh Daim >Assignee: Rishabh Daim >Priority: Major > Labels: DetailedGC > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10736) Collect DetailedGC Stats for DryRun mode
[ https://issues.apache.org/jira/browse/OAK-10736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10736: -- Labels: DetailedGC (was: ) > Collect DetailedGC Stats for DryRun mode > > > Key: OAK-10736 > URL: https://issues.apache.org/jira/browse/OAK-10736 > Project: Jackrabbit Oak > Issue Type: Task >Reporter: Rishabh Daim >Assignee: Rishabh Daim >Priority: Major > Labels: DetailedGC > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10382) oak-run support for flatfile
[ https://issues.apache.org/jira/browse/OAK-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10382: -- Epic Link: OAK-10739 > oak-run support for flatfile > > > Key: OAK-10382 > URL: https://issues.apache.org/jira/browse/OAK-10382 > Project: Jackrabbit Oak > Issue Type: Task > Components: oak-run >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > Fix For: 1.60.0 > > > As a follow-up of OAK-10347 we need a wrapper of the SimpleFlatFileUtil - > plus (potentially) a full-gc command which runs a full round of detail gc (in > DocumentNodeStore that is) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10586) DetailedGC hardening
[ https://issues.apache.org/jira/browse/OAK-10586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10586: -- Epic Link: OAK-10739 > DetailedGC hardening > > > Key: OAK-10586 > URL: https://issues.apache.org/jira/browse/OAK-10586 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Priority: Major > Labels: DetailedGC > > Umbrella ticket for hardening of {{DetailedGC/OAK-10199}} branch. To avoid > creating overly many tickets. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10583) repeat detailedGC also if provided scope not fully processed
[ https://issues.apache.org/jira/browse/OAK-10583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10583: -- Epic Link: OAK-10739 > repeat detailedGC also if provided scope not fully processed > > > Key: OAK-10583 > URL: https://issues.apache.org/jira/browse/OAK-10583 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Minor > Labels: DetailedGC > > Currently {{needrepeat}} is not set if the provided (detailedGC) scope is > "complete", i.e. is reaching the oldest checkpoint or now - maxTimeMillis. > However, in particular for the initial detailedGC run, the > PROGRESS_BATCH_SIZE will likely be hit and thus prevent the full scope to be > scanned. > A repetition of GC will continue from where the batch-interrupted previous > run left off, however the {{needrepeat}} is not correctly set in this case. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10570) oak-run support for fullgc
[ https://issues.apache.org/jira/browse/OAK-10570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10570: -- Epic Link: OAK-10739 > oak-run support for fullgc > -- > > Key: OAK-10570 > URL: https://issues.apache.org/jira/browse/OAK-10570 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk, oak-run >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > Fix For: 1.62.0 > > > As a follow-up of OAK-10347 we need a full-gc command which runs a full round > of detail gc (in DocumentNodeStore that is). > (split-off from OAK-10382) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10688) Keep only traversed state, remove all other revisions
[ https://issues.apache.org/jira/browse/OAK-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10688: -- Epic Link: OAK-10739 > Keep only traversed state, remove all other revisions > - > > Key: OAK-10688 > URL: https://issues.apache.org/jira/browse/OAK-10688 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > > As a slightly different algorithm to OAK-10535 this ticket suggests to > calculate the traversedState of a node, then keeps only those revisions > needed for that traversedState and removes all others. The main difference is > an inversion of logic, where instead of analysing for each revision whether > it must be kept or not - this first derives the revision that must be "kept" > from the traversedState - then deletes all others. > This mechanism applies to all (normal and bundled) properties as well as some > DocumentNodeStore internal ones, such as "_deleted". > Below are a list of assumptions to back this: > * DetailedGC runs only up to the older between the oldest checkpoint and > maxRevisionAge (24h by default). Thus a document analysed by DetailedGC is > guaranteed to have only 1 revision (per property) that must be kept - as it > is guaranteed to not have modifications (revisions) younger than any > checkpoint or maxRevisionAge (24h) > * To find out which revision(s) must be kept, the node tree is traversed from > root (based on current head revision) to the target document. > * Given the first bullet (that we're only looking at nodes that have only 1 > revision (each, per property) to keep, this traversed node state thus > contains the values of those. > * Hence, based on each of the property key of the traversed state, the > corresponding "commit revision" in the document-local map must be calculated. > That local map entry must be kept - all others can be deleted. > * Note that this also cleans up overwritten branch commits of the same branch > (as only the last, relevant one is kept) > As a result of the above, certain other entries can be deleted, namely: > * any "_commitRoot" entry no longer referenced by the local document > * any "_bc" entry no longer referenced by the local document > Independent of the traversedState and the outcome of the cleanup what can > also be removed is: > * any "_revisions" entry older than the current sweepRev > However: "_revisions" entry that might not be referenced by the local > document and are younger than the sweepRev must still be kept, as they might > be referenced by child documents (through their "_commitRoot" pointing to the > current document). Without checking for children and double-checking the > actual use, there could as a result still be some garbage "_revisions" > entries left. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10714) DGC : enable embedded verification for tests by default
[ https://issues.apache.org/jira/browse/OAK-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10714: -- Epic Link: OAK-10739 > DGC : enable embedded verification for tests by default > --- > > Key: OAK-10714 > URL: https://issues.apache.org/jira/browse/OAK-10714 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > > We should enable embedded verification for DetailedGC for tests by default. > (It is already enabled by default via DocumentNodeStoreService, but tests > don't always use that) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10584) Checkpoints.getOldestRevisionToKeep shouldn't failed if called read-only
[ https://issues.apache.org/jira/browse/OAK-10584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10584: -- Labels: (was: DetailedGC) > Checkpoints.getOldestRevisionToKeep shouldn't failed if called read-only > > > Key: OAK-10584 > URL: https://issues.apache.org/jira/browse/OAK-10584 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Affects Versions: 1.60.0 >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Minor > Fix For: 1.62.0 > > > Below exception could occur and should be avoided: > {noformat} > java.lang.UnsupportedOperationException: Method - findAndUpdate. Params: > [settings, key: checkpoint update {data.r17cea1494d3-0-1=REMOVE_MAP_ENTRY > null}] > at > org.apache.jackrabbit.oak.plugins.document.util.ReadOnlyDocumentStoreWrapperFactory$1.invoke(ReadOnlyDocumentStoreWrapperFactory.java:38) > at com.sun.proxy.$Proxy0.findAndUpdate(Unknown Source) > at > org.apache.jackrabbit.oak.plugins.document.Checkpoints.getOldestRevisionToKeep(Checkpoints.java:149) > at > org.apache.jackrabbit.oak.plugins.document.VersionGCRecommendations.(VersionGCRecommendations.java:181) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10727) log revisionDetailedGcType
[ https://issues.apache.org/jira/browse/OAK-10727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10727: -- Epic Link: OAK-10739 > log revisionDetailedGcType > -- > > Key: OAK-10727 > URL: https://issues.apache.org/jira/browse/OAK-10727 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10726) Fix BranchCommitGCTest and make it parameterized by gcType (also for VersionGarbageCollectorIT)
[ https://issues.apache.org/jira/browse/OAK-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10726: -- Epic Link: OAK-10739 > Fix BranchCommitGCTest and make it parameterized by gcType (also for > VersionGarbageCollectorIT) > --- > > Key: OAK-10726 > URL: https://issues.apache.org/jira/browse/OAK-10726 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10715) embedded verification should use traversed nodeState
[ https://issues.apache.org/jira/browse/OAK-10715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10715: -- Epic Link: OAK-10739 > embedded verification should use traversed nodeState > > > Key: OAK-10715 > URL: https://issues.apache.org/jira/browse/OAK-10715 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > > Currently DetailedGC's embedded verification uses the headRevision in > getNodeAtRevision. it should use the lastRevision of the traversed nodeState > instead. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10724) Introduce detailed gc mode that only deletes orphan nodes and deleted properties
[ https://issues.apache.org/jira/browse/OAK-10724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10724: -- Epic Link: OAK-10739 > Introduce detailed gc mode that only deletes orphan nodes and deleted > properties > > > Key: OAK-10724 > URL: https://issues.apache.org/jira/browse/OAK-10724 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10728) embedded verification fails if id is from long path
[ https://issues.apache.org/jira/browse/OAK-10728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10728: -- Epic Link: OAK-10739 > embedded verification fails if id is from long path > --- > > Key: OAK-10728 > URL: https://issues.apache.org/jira/browse/OAK-10728 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10734) DetailedGC must keep entries in "_revisions" for non branch commits, unless older than sweep
[ https://issues.apache.org/jira/browse/OAK-10734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10734: -- Epic Link: OAK-10739 > DetailedGC must keep entries in "_revisions" for non branch commits, unless > older than sweep > > > Key: OAK-10734 > URL: https://issues.apache.org/jira/browse/OAK-10734 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > > Entries in "_revisions" (for non root documents) could be referenced by > children in case of non branch commits. They must thus be kept. Unless older > than sweep. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OAK-10730) Log MongoException previously swallowed
[ https://issues.apache.org/jira/browse/OAK-10730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833500#comment-17833500 ] Stefan Egli commented on OAK-10730: --- Suggestion created at https://github.com/apache/jackrabbit-oak/pull/1399 > Log MongoException previously swallowed > --- > > Key: OAK-10730 > URL: https://issues.apache.org/jira/browse/OAK-10730 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Julian Reschke >Priority: Major > > In > [MongoDocumentStore.create|https://github.com/apache/jackrabbit-oak/blob/2e996d78f0a565b17287af5691f2c1be7d2e925d/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/mongo/MongoDocumentStore.java#L1754-L1756] > a MongoException is silently swallowed. > This code is quite ancient - it was created in svn revision > [1451586|https://svn.apache.org/viewvc?view=revision=1451586] - we > might thus want to be careful not to cause noise in a case where this > swallowing was legitimate. > I would thus suggest to start logging this at debug or info. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (OAK-10734) DetailedGC must keep entries in "_revisions" for non branch commits, unless older than sweep
[ https://issues.apache.org/jira/browse/OAK-10734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli resolved OAK-10734. --- Resolution: Done PR merged > DetailedGC must keep entries in "_revisions" for non branch commits, unless > older than sweep > > > Key: OAK-10734 > URL: https://issues.apache.org/jira/browse/OAK-10734 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > > Entries in "_revisions" (for non root documents) could be referenced by > children in case of non branch commits. They must thus be kept. Unless older > than sweep. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OAK-10535) Clean up old revisions in a document
[ https://issues.apache.org/jira/browse/OAK-10535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833215#comment-17833215 ] Stefan Egli commented on OAK-10535: --- [~corderob], I've now un-ignored the test case originally added that was blocked by OAK-10535, as we now have different variants and amongst them one that fixes it, so I'd like to have it unignored. The PR is [PR#1396|https://github.com/apache/jackrabbit-oak/pull/1396] (it is against another PR of mine, but is otherwise just an unignore and adjustments in asserts) > Clean up old revisions in a document > > > Key: OAK-10535 > URL: https://issues.apache.org/jira/browse/OAK-10535 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: documentmk >Reporter: José Andrés Cordero Benítez >Assignee: José Andrés Cordero Benítez >Priority: Minor > > Introduce a way to safely detect and delete old revisions in a document. This > could be useful to cleanup documents that sometimes grows above the supported > size in MongoDB (16MB). > It could be also integrate into the detailed GC. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OAK-10734) DetailedGC must keep entries in "_revisions" for non branch commits, unless older than sweep
[ https://issues.apache.org/jira/browse/OAK-10734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833096#comment-17833096 ] Stefan Egli commented on OAK-10734: --- * created https://github.com/apache/jackrabbit-oak/pull/1393 > DetailedGC must keep entries in "_revisions" for non branch commits, unless > older than sweep > > > Key: OAK-10734 > URL: https://issues.apache.org/jira/browse/OAK-10734 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > > Entries in "_revisions" (for non root documents) could be referenced by > children in case of non branch commits. They must thus be kept. Unless older > than sweep. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10734) DetailedGC must keep entries in "_revisions" for non branch commits, unless older than sweep
[ https://issues.apache.org/jira/browse/OAK-10734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10734: -- Labels: DetailedGC (was: ) > DetailedGC must keep entries in "_revisions" for non branch commits, unless > older than sweep > > > Key: OAK-10734 > URL: https://issues.apache.org/jira/browse/OAK-10734 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > > Entries in "_revisions" (for non root documents) could be referenced by > children in case of non branch commits. They must thus be kept. Unless older > than sweep. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (OAK-10734) DetailedGC must keep entries in "_revisions" for non branch commits, unless older than sweep
Stefan Egli created OAK-10734: - Summary: DetailedGC must keep entries in "_revisions" for non branch commits, unless older than sweep Key: OAK-10734 URL: https://issues.apache.org/jira/browse/OAK-10734 Project: Jackrabbit Oak Issue Type: Task Components: documentmk Reporter: Stefan Egli Assignee: Stefan Egli Entries in "_revisions" (for non root documents) could be referenced by children in case of non branch commits. They must thus be kept. Unless older than sweep. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (OAK-10730) Log MongoException previously swallowed
Stefan Egli created OAK-10730: - Summary: Log MongoException previously swallowed Key: OAK-10730 URL: https://issues.apache.org/jira/browse/OAK-10730 Project: Jackrabbit Oak Issue Type: Task Components: documentmk Reporter: Stefan Egli In [MongoDocumentStore.create|https://github.com/apache/jackrabbit-oak/blob/2e996d78f0a565b17287af5691f2c1be7d2e925d/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/mongo/MongoDocumentStore.java#L1754-L1756] a MongoException is silently swallowed. This code is quite ancient - it was created in svn revision [1451586|https://svn.apache.org/viewvc?view=revision=1451586] - we might thus want to be careful not to cause noise in a case where this swallowing was legitimate. I would thus suggest to start logging this at debug or info. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (OAK-10727) log revisionDetailedGcType
[ https://issues.apache.org/jira/browse/OAK-10727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli resolved OAK-10727. --- Resolution: Done PR merged, marking done > log revisionDetailedGcType > -- > > Key: OAK-10727 > URL: https://issues.apache.org/jira/browse/OAK-10727 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (OAK-10728) embedded verification fails if id is from long path
[ https://issues.apache.org/jira/browse/OAK-10728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli resolved OAK-10728. --- Resolution: Done PR merged, thx for reviews, marking done > embedded verification fails if id is from long path > --- > > Key: OAK-10728 > URL: https://issues.apache.org/jira/browse/OAK-10728 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OAK-10728) embedded verification fails if id is from long path
[ https://issues.apache.org/jira/browse/OAK-10728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831483#comment-17831483 ] Stefan Egli commented on OAK-10728: --- * PR created at https://github.com/apache/jackrabbit-oak/pull/1389 > embedded verification fails if id is from long path > --- > > Key: OAK-10728 > URL: https://issues.apache.org/jira/browse/OAK-10728 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10728) embedded verification fails if id is from long path
[ https://issues.apache.org/jira/browse/OAK-10728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10728: -- Labels: DetailedGC (was: ) > embedded verification fails if id is from long path > --- > > Key: OAK-10728 > URL: https://issues.apache.org/jira/browse/OAK-10728 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (OAK-10728) embedded verification fails if id is from long path
Stefan Egli created OAK-10728: - Summary: embedded verification fails if id is from long path Key: OAK-10728 URL: https://issues.apache.org/jira/browse/OAK-10728 Project: Jackrabbit Oak Issue Type: Task Components: documentmk Reporter: Stefan Egli Assignee: Stefan Egli -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (OAK-10726) Fix BranchCommitGCTest and make it parameterized by gcType (also for VersionGarbageCollectorIT)
[ https://issues.apache.org/jira/browse/OAK-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli resolved OAK-10726. --- Resolution: Done PR merged, marking done > Fix BranchCommitGCTest and make it parameterized by gcType (also for > VersionGarbageCollectorIT) > --- > > Key: OAK-10726 > URL: https://issues.apache.org/jira/browse/OAK-10726 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10726) Fix BranchCommitGCTest and make it parameterized by gcType (also for VersionGarbageCollectorIT)
[ https://issues.apache.org/jira/browse/OAK-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10726: -- Summary: Fix BranchCommitGCTest and make it parameterized by gcType (also for VersionGarbageCollectorIT) (was: Fix BranchCommitGCTest and make it parameterized by gcType) > Fix BranchCommitGCTest and make it parameterized by gcType (also for > VersionGarbageCollectorIT) > --- > > Key: OAK-10726 > URL: https://issues.apache.org/jira/browse/OAK-10726 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OAK-10727) log revisionDetailedGcType
[ https://issues.apache.org/jira/browse/OAK-10727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831344#comment-17831344 ] Stefan Egli commented on OAK-10727: --- PR created at https://github.com/apache/jackrabbit-oak/pull/1387 > log revisionDetailedGcType > -- > > Key: OAK-10727 > URL: https://issues.apache.org/jira/browse/OAK-10727 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10727) log revisionDetailedGcType
[ https://issues.apache.org/jira/browse/OAK-10727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10727: -- Labels: DetailedGC (was: ) > log revisionDetailedGcType > -- > > Key: OAK-10727 > URL: https://issues.apache.org/jira/browse/OAK-10727 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (OAK-10727) log revisionDetailedGcType
Stefan Egli created OAK-10727: - Summary: log revisionDetailedGcType Key: OAK-10727 URL: https://issues.apache.org/jira/browse/OAK-10727 Project: Jackrabbit Oak Issue Type: Task Components: documentmk Reporter: Stefan Egli Assignee: Stefan Egli -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OAK-10726) Fix BranchCommitGCTest and make it parameterized by gcType
[ https://issues.apache.org/jira/browse/OAK-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831339#comment-17831339 ] Stefan Egli commented on OAK-10726: --- PR created at https://github.com/apache/jackrabbit-oak/pull/1386 > Fix BranchCommitGCTest and make it parameterized by gcType > -- > > Key: OAK-10726 > URL: https://issues.apache.org/jira/browse/OAK-10726 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (OAK-10724) Introduce detailed gc mode that only deletes orphan nodes and deleted properties
[ https://issues.apache.org/jira/browse/OAK-10724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli resolved OAK-10724. --- Resolution: Done PR merged BUT requires OAK-10726 to fix the BranchCommitGCTest > Introduce detailed gc mode that only deletes orphan nodes and deleted > properties > > > Key: OAK-10724 > URL: https://issues.apache.org/jira/browse/OAK-10724 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10726) Fix BranchCommitGCTest and make it parameterized by gcType
[ https://issues.apache.org/jira/browse/OAK-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10726: -- Labels: DetailedGC (was: ) > Fix BranchCommitGCTest and make it parameterized by gcType > -- > > Key: OAK-10726 > URL: https://issues.apache.org/jira/browse/OAK-10726 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (OAK-10726) Fix BranchCommitGCTest and make it parameterized by gcType
Stefan Egli created OAK-10726: - Summary: Fix BranchCommitGCTest and make it parameterized by gcType Key: OAK-10726 URL: https://issues.apache.org/jira/browse/OAK-10726 Project: Jackrabbit Oak Issue Type: Task Components: documentmk Reporter: Stefan Egli Assignee: Stefan Egli -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (OAK-10715) embedded verification should use traversed nodeState
[ https://issues.apache.org/jira/browse/OAK-10715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli resolved OAK-10715. --- Resolution: Fixed PR merged, marking done > embedded verification should use traversed nodeState > > > Key: OAK-10715 > URL: https://issues.apache.org/jira/browse/OAK-10715 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > > Currently DetailedGC's embedded verification uses the headRevision in > getNodeAtRevision. it should use the lastRevision of the traversed nodeState > instead. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10715) embedded verification should use traversed nodeState
[ https://issues.apache.org/jira/browse/OAK-10715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10715: -- Labels: DetailedGC (was: ) > embedded verification should use traversed nodeState > > > Key: OAK-10715 > URL: https://issues.apache.org/jira/browse/OAK-10715 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > > Currently DetailedGC's embedded verification uses the headRevision in > getNodeAtRevision. it should use the lastRevision of the traversed nodeState > instead. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10724) Introduce detailed gc mode that only deletes orphan nodes and deleted properties
[ https://issues.apache.org/jira/browse/OAK-10724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10724: -- Labels: DetailedGC (was: ) > Introduce detailed gc mode that only deletes orphan nodes and deleted > properties > > > Key: OAK-10724 > URL: https://issues.apache.org/jira/browse/OAK-10724 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OAK-10724) Introduce detailed gc mode that only deletes orphan nodes and deleted properties
[ https://issues.apache.org/jira/browse/OAK-10724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831012#comment-17831012 ] Stefan Egli commented on OAK-10724: --- * PR created at https://github.com/apache/jackrabbit-oak/pull/1383 > Introduce detailed gc mode that only deletes orphan nodes and deleted > properties > > > Key: OAK-10724 > URL: https://issues.apache.org/jira/browse/OAK-10724 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (OAK-10724) Introduce detailed gc mode that only deletes orphan nodes and deleted properties
Stefan Egli created OAK-10724: - Summary: Introduce detailed gc mode that only deletes orphan nodes and deleted properties Key: OAK-10724 URL: https://issues.apache.org/jira/browse/OAK-10724 Project: Jackrabbit Oak Issue Type: Task Components: documentmk Reporter: Stefan Egli Assignee: Stefan Egli -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OAK-10715) embedded verification should use traversed nodeState
[ https://issues.apache.org/jira/browse/OAK-10715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830916#comment-17830916 ] Stefan Egli commented on OAK-10715: --- * draft PR was discarded * new PR created at https://github.com/apache/jackrabbit-oak/pull/1377 > embedded verification should use traversed nodeState > > > Key: OAK-10715 > URL: https://issues.apache.org/jira/browse/OAK-10715 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > > Currently DetailedGC's embedded verification uses the headRevision in > getNodeAtRevision. it should use the lastRevision of the traversed nodeState > instead. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10688) Keep only traversed state, remove all other revisions
[ https://issues.apache.org/jira/browse/OAK-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10688: -- Labels: DetailedGC (was: ) > Keep only traversed state, remove all other revisions > - > > Key: OAK-10688 > URL: https://issues.apache.org/jira/browse/OAK-10688 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > > As a slightly different algorithm to OAK-10535 this ticket suggests to > calculate the traversedState of a node, then keeps only those revisions > needed for that traversedState and removes all others. The main difference is > an inversion of logic, where instead of analysing for each revision whether > it must be kept or not - this first derives the revision that must be "kept" > from the traversedState - then deletes all others. > This mechanism applies to all (normal and bundled) properties as well as some > DocumentNodeStore internal ones, such as "_deleted". > Below are a list of assumptions to back this: > * DetailedGC runs only up to the older between the oldest checkpoint and > maxRevisionAge (24h by default). Thus a document analysed by DetailedGC is > guaranteed to have only 1 revision (per property) that must be kept - as it > is guaranteed to not have modifications (revisions) younger than any > checkpoint or maxRevisionAge (24h) > * To find out which revision(s) must be kept, the node tree is traversed from > root (based on current head revision) to the target document. > * Given the first bullet (that we're only looking at nodes that have only 1 > revision (each, per property) to keep, this traversed node state thus > contains the values of those. > * Hence, based on each of the property key of the traversed state, the > corresponding "commit revision" in the document-local map must be calculated. > That local map entry must be kept - all others can be deleted. > * Note that this also cleans up overwritten branch commits of the same branch > (as only the last, relevant one is kept) > As a result of the above, certain other entries can be deleted, namely: > * any "_commitRoot" entry no longer referenced by the local document > * any "_bc" entry no longer referenced by the local document > Independent of the traversedState and the outcome of the cleanup what can > also be removed is: > * any "_revisions" entry older than the current sweepRev > However: "_revisions" entry that might not be referenced by the local > document and are younger than the sweepRev must still be kept, as they might > be referenced by child documents (through their "_commitRoot" pointing to the > current document). Without checking for children and double-checking the > actual use, there could as a result still be some garbage "_revisions" > entries left. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (OAK-10688) Keep only traversed state, remove all other revisions
[ https://issues.apache.org/jira/browse/OAK-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli resolved OAK-10688. --- Resolution: Fixed [PR|https://github.com/apache/jackrabbit-oak/pull/1372] merged, marking done > Keep only traversed state, remove all other revisions > - > > Key: OAK-10688 > URL: https://issues.apache.org/jira/browse/OAK-10688 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > > As a slightly different algorithm to OAK-10535 this ticket suggests to > calculate the traversedState of a node, then keeps only those revisions > needed for that traversedState and removes all others. The main difference is > an inversion of logic, where instead of analysing for each revision whether > it must be kept or not - this first derives the revision that must be "kept" > from the traversedState - then deletes all others. > This mechanism applies to all (normal and bundled) properties as well as some > DocumentNodeStore internal ones, such as "_deleted". > Below are a list of assumptions to back this: > * DetailedGC runs only up to the older between the oldest checkpoint and > maxRevisionAge (24h by default). Thus a document analysed by DetailedGC is > guaranteed to have only 1 revision (per property) that must be kept - as it > is guaranteed to not have modifications (revisions) younger than any > checkpoint or maxRevisionAge (24h) > * To find out which revision(s) must be kept, the node tree is traversed from > root (based on current head revision) to the target document. > * Given the first bullet (that we're only looking at nodes that have only 1 > revision (each, per property) to keep, this traversed node state thus > contains the values of those. > * Hence, based on each of the property key of the traversed state, the > corresponding "commit revision" in the document-local map must be calculated. > That local map entry must be kept - all others can be deleted. > * Note that this also cleans up overwritten branch commits of the same branch > (as only the last, relevant one is kept) > As a result of the above, certain other entries can be deleted, namely: > * any "_commitRoot" entry no longer referenced by the local document > * any "_bc" entry no longer referenced by the local document > Independent of the traversedState and the outcome of the cleanup what can > also be removed is: > * any "_revisions" entry older than the current sweepRev > However: "_revisions" entry that might not be referenced by the local > document and are younger than the sweepRev must still be kept, as they might > be referenced by child documents (through their "_commitRoot" pointing to the > current document). Without checking for children and double-checking the > actual use, there could as a result still be some garbage "_revisions" > entries left. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (OAK-10714) DGC : enable embedded verification for tests by default
[ https://issues.apache.org/jira/browse/OAK-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli resolved OAK-10714. --- Resolution: Done PR merged to detailedgc branch, marking done. > DGC : enable embedded verification for tests by default > --- > > Key: OAK-10714 > URL: https://issues.apache.org/jira/browse/OAK-10714 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > > We should enable embedded verification for DetailedGC for tests by default. > (It is already enabled by default via DocumentNodeStoreService, but tests > don't always use that) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OAK-10714) DGC : enable embedded verification for tests by default
[ https://issues.apache.org/jira/browse/OAK-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-10714: -- Labels: DetailedGC (was: ) > DGC : enable embedded verification for tests by default > --- > > Key: OAK-10714 > URL: https://issues.apache.org/jira/browse/OAK-10714 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > Labels: DetailedGC > > We should enable embedded verification for DetailedGC for tests by default. > (It is already enabled by default via DocumentNodeStoreService, but tests > don't always use that) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OAK-10715) embedded verification should use traversed nodeState
[ https://issues.apache.org/jira/browse/OAK-10715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829572#comment-17829572 ] Stefan Egli commented on OAK-10715: --- * draft PR created at https://github.com/apache/jackrabbit-oak/pull/1375 * draft as we should first merge OAK-10688 and then have a PR directly to DetailedGC (not a PR to a branch of a branch) > embedded verification should use traversed nodeState > > > Key: OAK-10715 > URL: https://issues.apache.org/jira/browse/OAK-10715 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > > Currently DetailedGC's embedded verification uses the headRevision in > getNodeAtRevision. it should use the lastRevision of the traversed nodeState > instead. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (OAK-10715) embedded verification should use traversed nodeState
Stefan Egli created OAK-10715: - Summary: embedded verification should use traversed nodeState Key: OAK-10715 URL: https://issues.apache.org/jira/browse/OAK-10715 Project: Jackrabbit Oak Issue Type: Task Components: documentmk Reporter: Stefan Egli Assignee: Stefan Egli Currently DetailedGC's embedded verification uses the headRevision in getNodeAtRevision. it should use the lastRevision of the traversed nodeState instead. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OAK-10714) DGC : enable embedded verification for tests by default
[ https://issues.apache.org/jira/browse/OAK-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829569#comment-17829569 ] Stefan Egli commented on OAK-10714: --- * PR created at https://github.com/apache/jackrabbit-oak/pull/1374 > DGC : enable embedded verification for tests by default > --- > > Key: OAK-10714 > URL: https://issues.apache.org/jira/browse/OAK-10714 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Major > > We should enable embedded verification for DetailedGC for tests by default. > (It is already enabled by default via DocumentNodeStoreService, but tests > don't always use that) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (OAK-10714) DGC : enable embedded verification for tests by default
Stefan Egli created OAK-10714: - Summary: DGC : enable embedded verification for tests by default Key: OAK-10714 URL: https://issues.apache.org/jira/browse/OAK-10714 Project: Jackrabbit Oak Issue Type: Task Components: documentmk Reporter: Stefan Egli Assignee: Stefan Egli We should enable embedded verification for DetailedGC for tests by default. (It is already enabled by default via DocumentNodeStoreService, but tests don't always use that) -- This message was sent by Atlassian Jira (v8.20.10#820010)