dramaticlly commented on code in PR #15003:
URL: https://github.com/apache/iceberg/pull/15003#discussion_r2752486001


##########
core/src/main/java/org/apache/iceberg/SnapshotProducer.java:
##########
@@ -645,6 +645,35 @@ protected boolean cleanupAfterCommit() {
     return true;
   }
 
+  /**

Review Comment:
   I guess it's a bit difficult to read with auto formater, but LINE 103 of 
https://github.com/apache/iceberg/blob/026ec35b7969539b535af03944fae53d059cd233/core/src/main/java/org/apache/iceberg/BaseRewriteManifests.java#L103
 does populate the KEPT_MANIFESTS_COUNT and it's taken cared of.



##########
core/src/test/java/org/apache/iceberg/TestDeleteFiles.java:
##########
@@ -351,6 +351,62 @@ public void testDeleteFilesOnIndependentBranches() {
         statuses(Status.EXISTING, Status.DELETED, Status.DELETED));
   }
 
+  @TestTemplate
+  public void testDeleteFilesWithManifestSnapshotSummary() {

Review Comment:
   I added assertions to existing tests which cares about manifest, also for 
the TestRowDelta



##########
core/src/main/java/org/apache/iceberg/ManifestMergeManager.java:
##########
@@ -200,8 +217,9 @@ private ManifestFile createManifest(int specId, 
List<ManifestFile> bin) {
 
     ManifestFile manifest = writer.toManifestFile();
 
-    // update the cache
+    // update the cache and track replaced manifests
     mergedManifests.put(bin, manifest);
+    replacedManifestsCount.addAndGet(bin.size());

Review Comment:
   I think this piece of code is a bit tricky but we would actually overcount 
the replaced manifest if this count is moved in line 86. 
   
   Let me share a bit more, not all manifests in a group are actually merged 
for `toBeMerged`. Say we have a bin with size =1, of any bin with size < 
minCountToMerge (default to 100), those manifest will be kept as-is and we 
expect the manifest replaced count = 0.
   
   The manifest will only be merged when criteria met, and that's the reason 
why replaced count is tracked inside `createManifest` to handle cache hits on 
retries. 
   
   I also improve the code to only track the replaced manifest from previous 
snapshot  in ManifestMergeManager



##########
core/src/main/java/org/apache/iceberg/SnapshotProducer.java:
##########
@@ -645,6 +645,34 @@ protected boolean cleanupAfterCommit() {
     return true;
   }
 
+  /**
+   * Builds a snapshot summary with manifest counts.
+   *
+   * @param manifests the list of manifests in the new snapshot
+   * @param replacedManifestsCount the count of manifests that were replaced 
(rewritten)
+   * @return a summary builder with manifest count metrics set
+   */
+  protected SnapshotSummary.Builder buildManifestCountSummary(
+      List<ManifestFile> manifests, int replacedManifestsCount) {
+    SnapshotSummary.Builder summaryBuilder = SnapshotSummary.builder();
+    int manifestsCreated = 0;
+    int manifestsKept = 0;
+
+    for (ManifestFile manifest : manifests) {
+      if (snapshotId() == manifest.snapshotId()) {

Review Comment:
   I guess if the manifest is written by ManifestWriter, it shall have snapshot 
configured on constructor. There's no strong reason to fail when we generate 
the optional snapshot summary, if you feel there's the need, we can only 
increment the kept-manifest count if  `manifest.snapshotId()` is non-null?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to