dramaticlly commented on code in PR #13720:
URL: https://github.com/apache/iceberg/pull/13720#discussion_r2292307516


##########
spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteTablePathSparkAction.java:
##########
@@ -281,24 +281,32 @@ private String rebuildMetadata() {
     // rebuild version files
     RewriteResult<Snapshot> rewriteVersionResult = 
rewriteVersionFiles(endMetadata);
     Set<Snapshot> deltaSnapshots = deltaSnapshots(startMetadata, 
rewriteVersionResult.toRewrite());
-
-    Set<String> manifestsToRewrite = manifestsToRewrite(deltaSnapshots, 
startMetadata);
     Set<Snapshot> validSnapshots =
         Sets.difference(snapshotSet(endMetadata), snapshotSet(startMetadata));
 
+    // rebuild manifest files
+    Set<ManifestFile> manifestsToRewrite = manifestsToRewrite(validSnapshots);

Review Comment:
   > Is deltaSnapshots and validSnapshots same?
   
   No, I guess the naming can probably improve here, 
   - deltaSnapshots computes by tracking all snapshots from all of table 
versions between (startVersion, endVersion] and also exclude any snapshots 
already present in startVerison (this is superset)
   - validSnapshots simply compare the snapshots between 2 table versions 
(startVersion and endVersion)
   
   So they can differ when snapshot expiration happens, some expired snapshot 
is only tracked in deltaSnapshot but no longer in validSnapshot, and I had to 
fix this in #12006 to avoid copy already deleted files.
   
   This patch change from using `deltaSnapshots` to `validSnapshots` to filter 
manifests by referenced snapshot id, I think the result is the same as 
manifests referenced by the expired snapshot shall be handled/deleted as part 
of snapshot expiration. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to