rishabhdaim commented on code in PR #993:
URL: https://github.com/apache/jackrabbit-oak/pull/993#discussion_r1277383337
##########
oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/VersionGarbageCollector.java:
##########
@@ -521,6 +614,121 @@ private VersionGCStats gc(long maxRevisionAgeInMillis)
throws IOException {
return stats;
}
+ /**
+ * "Detail garbage" refers to additional garbage identified as part of
OAK-10199
+ * et al: essentially garbage that in earlier versions of Oak were
ignored. This
+ * includes: deleted properties, revision information within
documents, branch
+ * commit related garbage.
+ * <p/>
+ * TODO: limit this to run only on a singleton instance, eg the
cluster leader
+ * <p/>
+ * The "detail garbage" collector can be instructed to do a full
repository scan
+ * - or incrementally based on where it last left off. When doing a
full
+ * repository scan (but not limited to that), it executes in (small)
batches
+ * followed by voluntary paused (aka throttling) to avoid excessive
load on the
+ * system. The full repository scan does not have to finish
particularly fast,
+ * it is okay that it takes a considerable amount of time.
+ *
+ * @param phases {@link GCPhases}
+ * @param headRevision the current head revision of node store
+ * @param rec {@link VersionGCRecommendations} to recommend GC
operation
+ */
+ private void collectDetailedGarbage(final GCPhases phases, final
RevisionVector headRevision, final VersionGCRecommendations rec)
+ throws IOException {
+ int docsTraversed = 0;
+ boolean foundDoc = true;
+ final long oldestModifiedMs = rec.scopeDetailedGC.fromMs;
+ final long toModified = rec.scopeDetailedGC.toMs;
+ long oldModifiedMs = oldestModifiedMs;
+ final String oldestModifiedDocId = rec.detailedGCId;
+ try (DetailedGC gc = new DetailedGC(headRevision, monitor,
cancel)) {
+ long fromModified = oldestModifiedMs;
+ String fromId =
ofNullable(oldestModifiedDocId).orElse(MIN_ID_VALUE);
+ NodeDocument lastDoc;
+ if (phases.start(GCPhase.DETAILED_GC)) {
+ while (foundDoc && fromModified < toModified &&
docsTraversed < PROGRESS_BATCH_SIZE) {
+ // set foundDoc to false to allow exiting the while
loop
+ foundDoc = false;
+ lastDoc = null;
+ Iterable<NodeDocument> itr =
versionStore.getModifiedDocs(fromModified, toModified, 1000, fromId);
Review Comment:
I will create a constant for the query limit. yes, we have a test validating
that the loop is broken.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]