rdblue commented on a change in pull request #3204:
URL: https://github.com/apache/iceberg/pull/3204#discussion_r719755694
##########
File path: core/src/main/java/org/apache/iceberg/SnapshotProducer.java
##########
@@ -167,7 +176,8 @@ public Snapshot apply() {
OutputFile manifestList = manifestListPath();
try (ManifestListWriter writer = ManifestLists.write(
- ops.current().formatVersion(), manifestList, snapshotId(),
parentSnapshotId, sequenceNumber)) {
+ ops.current().formatVersion(), manifestList, snapshotId(),
parentSnapshotId,
+ operation().equals(DataOperations.REPLACE) && sequenceNumber() !=
null ? sequenceNumber() : sequenceNumber)) {
Review comment:
I think this is a clever way to fix the problem. What we had considered
before is setting the sequence number of individual files rather than the
sequence number used for the manifest list. This is an easier way to set the
sequence number for all new data files in the snapshot.
I need to think about whether this is the right approach a bit more. As long
as we have a static sequence number, using inheritance is only a convenience.
We could set that static sequence number on the individual data or delete files
that we add in the commit. I tend to lean toward that solution because it
minimizes the places that use the overridden sequence number -- so we know that
the manifest list and manifests all have the latest sequence number that is
assigned to the snapshot rather than also being re-sequenced.
I'll think about the trade-off some more.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]