mbutrovich commented on code in PR #15470:
URL: https://github.com/apache/iceberg/pull/15470#discussion_r2914611829


##########
spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteTablePathSparkAction.java:
##########
@@ -295,18 +294,14 @@ private Result rebuildMetadata() {
     RewriteContentFileResult rewriteManifestResult =
         rewriteManifests(deltaSnapshots, endMetadata, 
rewriteManifestListResult.toRewrite());
 
-    // rebuild position delete files
-    Set<DeleteFile> deleteFiles =
-        rewriteManifestResult.toRewrite().stream()
-            .filter(e -> e instanceof DeleteFile)
-            .map(e -> (DeleteFile) e)
-            .collect(Collectors.toSet());
-    rewritePositionDeletes(deleteFiles);
+    int rewrittenDeleteFilesCount =
+        (int)
+            rewriteManifestResult.toRewrite().stream().filter(e -> e 
instanceof DeleteFile).count();

Review Comment:
   Fixed, thanks!



##########
core/src/main/java/org/apache/iceberg/RewriteTablePathUtil.java:
##########
@@ -445,25 +451,36 @@ private static RewriteResult<DeleteFile> 
writeDeleteFileEntry(
       String sourcePrefix,
       String targetPrefix,
       String stagingLocation,
-      ManifestWriter<DeleteFile> writer) {
+      ManifestWriter<DeleteFile> writer,
+      FileIO io,
+      PositionDeleteReaderWriter posDeleteReaderWriter) {
 
     DeleteFile file = entry.file();
     RewriteResult<DeleteFile> result = new RewriteResult<>();
 
     switch (file.content()) {
       case POSITION_DELETES:
-        DeleteFile posDeleteFile = newPositionDeleteEntry(file, spec, 
sourcePrefix, targetPrefix);
+        // Rewrite inline so the manifest records the actual file size, which 
changes because
+        // embedded data file paths are rewritten. The staging path is 
deterministic, so
+        // duplicates across manifests simply overwrite with identical content.
+        String staging = stagingPath(file.location(), sourcePrefix, 
stagingLocation);

Review Comment:
   Fixed, thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to