amogh-jahagirdar commented on code in PR #15006:
URL: https://github.com/apache/iceberg/pull/15006#discussion_r2695054197
##########
core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java:
##########
@@ -87,7 +97,7 @@ abstract class MergingSnapshotProducer<ThisT> extends
SnapshotProducer<ThisT> {
private final Map<Integer, DataFileSet> newDataFilesBySpec =
Maps.newHashMap();
private Long newDataFilesDataSequenceNumber;
private final Map<Integer, DeleteFileSet> newDeleteFilesBySpec =
Maps.newHashMap();
- private final Set<String> newDVRefs = Sets.newHashSet();
+ private final Map<String, DeleteFileSet> dvsByReferencedFile =
Maps.newHashMap();
Review Comment:
cc @rdblue @RussellSpitzer made this a multimap. I originally wanted this to
be a fileScopedDeletes (e.g. also have v2 file scoped deletes) so that in case
of any mistakes in implementation we could merge the v2 position deletes but I
think the fundamental challenge is that we don't have a good way in the.
Iceberg core modules to read the position deletes. All that code is in the data
module. I think we'd have to do some questionable things with InternalData if
we really cared about this case, but since in practice the Table APIs already
protect us by preventing adding V2 pos deletes to v3 tables, we're already
covered enough imo?
Also, if we cared about state simplification we could probably go even
further here by just tracking newDeletesBySpec, and then lazily evaluating the
dvsByReferencedFile. However, since we already were tracking newDVRefs, I
figured it's not that much more _state_ complexity to change it to a multimap.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]