dramaticlly commented on code in PR #13720: URL: https://github.com/apache/iceberg/pull/13720#discussion_r2292307516
########## spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteTablePathSparkAction.java: ########## @@ -281,24 +281,32 @@ private String rebuildMetadata() { // rebuild version files RewriteResult<Snapshot> rewriteVersionResult = rewriteVersionFiles(endMetadata); Set<Snapshot> deltaSnapshots = deltaSnapshots(startMetadata, rewriteVersionResult.toRewrite()); - - Set<String> manifestsToRewrite = manifestsToRewrite(deltaSnapshots, startMetadata); Set<Snapshot> validSnapshots = Sets.difference(snapshotSet(endMetadata), snapshotSet(startMetadata)); + // rebuild manifest files + Set<ManifestFile> manifestsToRewrite = manifestsToRewrite(validSnapshots); Review Comment: > Is deltaSnapshots and validSnapshots same? No, I guess the naming can probably improve here, - deltaSnapshots computes by tracking all snapshots from all of table versions between (startVersion, endVersion] and also exclude any snapshots already present in startVerison (this is superset) - validSnapshots simply compare the snapshots between 2 table versions (startVersion and endVersion) So they can differ when snapshot expiration happens, some expired snapshot is only tracked in deltaSnapshot but no longer in validSnapshot, and I had to fix this in #12006 to avoid copy already deleted files. This patch change from using `deltaSnapshots` to `validSnapshots` to filter manifests by referenced snapshot id, I think the result is the same as manifests referenced by the expired snapshot shall be handled/deleted as part of snapshot expiration. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org