nastra commented on code in PR #14351:
URL: https://github.com/apache/iceberg/pull/14351#discussion_r2502751556


##########
core/src/main/java/org/apache/iceberg/RewriteTablePathUtil.java:
##########
@@ -592,6 +641,66 @@ record = recordIt.next();
     }
   }
 
+  /**
+   * Rewrite a DV (Deletion Vector) file, updating the referenced data file 
paths in blob metadata.
+   *
+   * @param deleteFile source DV file to be rewritten
+   * @param outputFile output file to write the rewritten DV to
+   * @param io file io
+   * @param sourcePrefix source prefix that will be replaced
+   * @param targetPrefix target prefix to replace it
+   */
+  private static void rewriteDVFile(
+      DeleteFile deleteFile,
+      OutputFile outputFile,
+      FileIO io,
+      String sourcePrefix,
+      String targetPrefix)
+      throws IOException {
+    InputFile sourceFile = io.newInputFile(deleteFile.location());
+
+    try (org.apache.iceberg.puffin.PuffinReader reader =
+        org.apache.iceberg.puffin.Puffin.read(sourceFile).build()) {
+
+      List<org.apache.iceberg.puffin.BlobMetadata> blobs = 
reader.fileMetadata().blobs();
+
+      try (org.apache.iceberg.puffin.PuffinWriter writer =
+          org.apache.iceberg.puffin.Puffin.write(outputFile)
+              .createdBy(org.apache.iceberg.IcebergBuild.fullVersion())
+              .build()) {
+
+        // Read all blobs and rewrite them with updated referenced data file 
paths
+        for (Pair<org.apache.iceberg.puffin.BlobMetadata, java.nio.ByteBuffer> 
blobPair :
+            reader.readAll(blobs)) {
+          org.apache.iceberg.puffin.BlobMetadata blobMetadata = 
blobPair.first();
+          java.nio.ByteBuffer blobData = blobPair.second();
+
+          // Get the original properties and update the referenced data file 
path
+          Map<String, String> properties = 
Maps.newHashMap(blobMetadata.properties());
+          String referencedDataFile = properties.get("referenced-data-file");
+          if (referencedDataFile != null && 
referencedDataFile.startsWith(sourcePrefix)) {
+            String newReferencedDataFile = newPath(referencedDataFile, 
sourcePrefix, targetPrefix);
+            properties.put("referenced-data-file", newReferencedDataFile);
+          }
+
+          // Create a new blob with updated properties
+          org.apache.iceberg.puffin.Blob blob =
+              new org.apache.iceberg.puffin.Blob(
+                  blobMetadata.type(),
+                  blobMetadata.inputFields(),
+                  blobMetadata.snapshotId(),
+                  blobMetadata.sequenceNumber(),
+                  blobData,
+                  null, // compression codec (keep uncompressed)

Review Comment:
   ```suggestion
                     
PuffinCompressionCodec.forName(blobMetadata.compressionCodec()),
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to