Re: [PR] Core: Track duplicate DVs for data file and merge them before committing [iceberg]

via GitHub Tue, 13 Jan 2026 07:10:38 -0800


RussellSpitzer commented on code in PR #15006:
URL: https://github.com/apache/iceberg/pull/15006#discussion_r2686844618



##########
core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java:
##########
@@ -87,6 +96,7 @@ abstract class MergingSnapshotProducer<ThisT> extends 
SnapshotProducer<ThisT> {
   private final Map<Integer, DataFileSet> newDataFilesBySpec = 
Maps.newHashMap();
   private Long newDataFilesDataSequenceNumber;
   private final Map<Integer, DeleteFileSet> newDeleteFilesBySpec = 
Maps.newHashMap();
+  private final Map<String, DeleteFileSet> duplicateDVsForDataFile = 
Maps.newHashMap();

Review Comment:
   Are we saving that much by keeping this in an optionally populated map? I 
feel like we could just have "newDVRefs" just be Map<String DataFileName  
DeleteVector>
   
   That would increase our memory usage by the name of every Data File Name, 
but I feel like that can't be that much since we are already storing all the 
Datafile objects completely ... Just feel like things are easier that way...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Core: Track duplicate DVs for data file and merge them before committing [iceberg]

Reply via email to