stefan-egli commented on code in PR #1314:
URL: https://github.com/apache/jackrabbit-oak/pull/1314#discussion_r1497822015
##########
oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/util/Utils.java:
##########
@@ -274,6 +277,61 @@ private static String diagsForEntry(Map.Entry<String,
PropertyStats> member) {
}
}
+ /**
+ * Produce an {@link UpdateOp} suitable for shrinking branch revision
entries for given property in {@link Document}, {@code null} otherwise.
+ *
+ * @param doc document to inspect for repeated branch commits
+ * @param propertName property to check for
+ * @param revisionChecker filter for revisions (for instance, to check for
cluster id)
+ * @return {@link UpdateOp} suitable for shrinking document, {@code null}
otherwise
+ */
+ public static @Nullable UpdateOp getShrinkOp(Document doc, String
propertyName, Predicate<Revision> revisionChecker) {
Review Comment:
If `doc` could be a `NodeDocument` instead of the generic `Document` some of
those `instanceof` could be avoided...
##########
oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/util/Utils.java:
##########
@@ -274,6 +278,73 @@ private static String diagsForEntry(Map.Entry<String,
PropertyStats> member) {
}
}
+ /**
+ * @return cluster if from first revision found in op, {@code -1} otherwise
+ */
+ public static int extractClusterId(UpdateOp op) {
+ for (Key key : op.getChanges().keySet()) {
+ if (key.getRevision() != null) {
+ return key.getRevision().getClusterId();
+ }
+ }
+ return -1;
+ }
+
+ /**
+ * Produce an {@link UpdateOp} suitable for shrinking branch revision
entries for given property in {@link Document}, {@code null} otherwise.
+ *
+ * @param doc document to inspect for repeated branch commits
+ * @param propertName property to check for
+ * @param revisionChecker filter for revisions (for instance, to check for
cluster id)
+ * @return {@link UpdateOp} suitable for shrinking document, {@code null}
otherwise
+ */
+ public static @Nullable UpdateOp getShrinkOp(Document doc, String
propertyName, Predicate<Revision> revisionChecker) {
+ Object t_bc = doc.get("_bc");
+ Object t_property = doc.get(propertyName);
+ if (t_bc instanceof Map && t_property instanceof Map) {
+ @SuppressWarnings("unchecked")
+ Map<Revision, String> _bc = (Map<Revision, String>)t_bc;
+ @SuppressWarnings("unchecked")
+ Map<Revision, String> pMap = (Map<Revision, String>)t_property;
+ List<Revision> revs = new ArrayList<>();
+ for (Map.Entry<Revision, String> en : pMap.entrySet()) {
+ Revision r = en.getKey();
+ if (revisionChecker.apply(r)) {
+ String bcv = _bc.get(r);
+ if ("true".equals(bcv)) {
+ revs.add(r);
+ }
+ }
+ }
+ // sort by age
+ Collections.sort(revs, new Comparator<Revision>() {
+ @Override
+ public int compare(Revision r1, Revision r2) {
+ if (r1.getClusterId() != r2.getClusterId()) {
+ return r1.getClusterId() - r2.getClusterId();
+ } else if (r1.getTimestamp() != r2.getTimestamp()) {
+ return r1.getTimestamp() > r2.getTimestamp() ? 1 : -1;
+ } else {
+ return r1.getCounter() - r2.getCounter();
+ }
+ }});
+
+ UpdateOp clean = new UpdateOp(doc.getId(), false);
+ Revision last = null;
+ for (Revision r : revs) {
+ if (last != null) {
+ if (last.getClusterId() == r.getClusterId()) {
+ clean.removeMapEntry(propertyName, last);
+ }
+ }
+ last = r;
+ }
Review Comment:
* this seems to be a bit a more broader GC than expected. It deletes
basically all older revisions of the same clusterId. That seems fine - but at
that point I'm wondering why restrict it to only branch commits - i.e why
specifically only branch commits. Which would lead to the assumption that the
idea was to only remove "overwritten branch commits"?
* is the clusterId check here still necessary given the predicate check
introduced?
* this doesn't take the usual 24h GC max time nor active checkpoints into
account.
##########
oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/util/Utils.java:
##########
@@ -274,6 +277,61 @@ private static String diagsForEntry(Map.Entry<String,
PropertyStats> member) {
}
}
+ /**
+ * Produce an {@link UpdateOp} suitable for shrinking branch revision
entries for given property in {@link Document}, {@code null} otherwise.
+ *
+ * @param doc document to inspect for repeated branch commits
+ * @param propertName property to check for
+ * @param revisionChecker filter for revisions (for instance, to check for
cluster id)
+ * @return {@link UpdateOp} suitable for shrinking document, {@code null}
otherwise
+ */
+ public static @Nullable UpdateOp getShrinkOp(Document doc, String
propertyName, Predicate<Revision> revisionChecker) {
+ Object t_bc = doc.get("_bc");
+ Object t_property = doc.get(propertyName);
+ if (t_bc instanceof Map && t_property instanceof Map) {
+ @SuppressWarnings("unchecked")
+ Map<Revision, String> _bc = (Map<Revision, String>)t_bc;
+ @SuppressWarnings("unchecked")
+ Map<Revision, String> pMap = (Map<Revision, String>)t_property;
+ List<Revision> revs = new ArrayList<>();
+ for (Map.Entry<Revision, String> en : pMap.entrySet()) {
+ Revision r = en.getKey();
+ if (revisionChecker.apply(r)) {
+ String bcv = _bc.get(r);
+ if ("true".equals(bcv)) {
+ revs.add(r);
+ }
+ }
+ }
+ // sort by age
+ Collections.sort(revs, new Comparator<Revision>() {
Review Comment:
wondering if there isn't such a `Comparator` in oak land already?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]