amogh-jahagirdar commented on code in PR #15006:
URL: https://github.com/apache/iceberg/pull/15006#discussion_r2695054197


##########
core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java:
##########
@@ -87,7 +97,7 @@ abstract class MergingSnapshotProducer<ThisT> extends 
SnapshotProducer<ThisT> {
   private final Map<Integer, DataFileSet> newDataFilesBySpec = 
Maps.newHashMap();
   private Long newDataFilesDataSequenceNumber;
   private final Map<Integer, DeleteFileSet> newDeleteFilesBySpec = 
Maps.newHashMap();
-  private final Set<String> newDVRefs = Sets.newHashSet();
+  private final Map<String, DeleteFileSet> dvsByReferencedFile = 
Maps.newHashMap();

Review Comment:
   cc @rdblue @RussellSpitzer made this a multimap. I originally wanted this to 
be a fileScopedDeletes (e.g. also have v2 file scoped deletes) so that in case 
of any mistakes in implementation we could merge the v2 position deletes but I 
think the fundamental challenge is that we don't have a good way in the. 
Iceberg core modules to read the position deletes. All that code is in the data 
module. I think we'd have to do some questionable things with InternalData if 
we really cared about this case, but since in practice the Table APIs already 
protect us by preventing adding V2 pos deletes to v3 tables, we're already 
covered enough imo?
   
   Also, if we cared about state simplification we could probably go even 
further here by just tracking newDeletesBySpec, and then lazily evaluating the 
dvsByReferencedFile. However, since we already were tracking newDVRefs, I 
figured it's not that much more _state_ complexity to change it to a multimap.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to