amogh-jahagirdar commented on code in PR #15006:
URL: https://github.com/apache/iceberg/pull/15006#discussion_r2825428187


##########
core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java:
##########
@@ -1065,9 +1062,34 @@ private List<ManifestFile> newDeleteFilesAsManifests() {
     }
 
     if (cachedNewDeleteManifests.isEmpty()) {
+      for (Map.Entry<String, List<DeleteFile>> entry : 
dvsByReferencedFile.entrySet()) {
+        if (entry.getValue().size() > 1) {
+          LOG.warn(
+              "Attempted to commit {} duplicate DVs for data file {} in table 
{}. "
+                  + "Merging duplicates, and original DVs will be orphaned.",
+              entry.getValue().size(),
+              entry.getKey(),
+              tableName);
+        }
+      }
+
+      List<DeleteFile> finalDVs =
+          DVUtil.mergeAndWriteDvsIfRequired(
+              dvsByReferencedFile,
+              ThreadPools.getDeleteWorkerPool(),
+              ops().locationProvider(),
+              ops().encryption(),
+              ops().io(),
+              ops().current().specsById());
+      // Prevent commiting duplicate V2 deletes by deduping them
+      Map<Integer, List<DeleteFile>> newDeleteFilesBySpec =
+          Streams.stream(Iterables.concat(finalDVs, 
DeleteFileSet.of(positionAndEqualityDeletes)))
+              .map(file -> Delegates.pendingDeleteFile(file, 
file.dataSequenceNumber()))

Review Comment:
   mergedDVs would be constructed with the right data sequence number but this 
was done because we validate that every DeleteFile that we want to commit is an 
instance of `PendingDeleteFile` otherwise it fails. I didn't want to do that in 
DVUtil because I didn't want to make assumptions for how a caller would use it. 
   
   But you're right we don't need this for positionAndEqualityDeletes. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to