vanliu-tx commented on code in PR #4514:
URL: https://github.com/apache/iceberg/pull/4514#discussion_r847919934
##########
core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java:
##########
@@ -690,7 +690,7 @@ public List<ManifestFile> apply(TableMetadata base) {
base.schema(), current != null ? current.dataManifests() : null);
long minDataSequenceNumber = filtered.stream()
.map(ManifestFile::minSequenceNumber)
- .filter(seq -> seq > 0) // filter out unassigned sequence numbers in
rewritten manifests
+ .filter(seq -> seq >= 0) // filter out unassigned sequence numbers in
rewritten manifests
Review Comment:
Assuming the following steps
1. create v1 table and append some data files.
2. upgrade to v2 and commit.
3. append some row-delta delete and commit.
4. remove one data file and commit.
5. remove another data file and commit.
```
long minDataSequenceNumber = filtered.stream()
.map(ManifestFile::minSequenceNumber)
.filter(seq -> seq > 0) // filter out unassigned sequence numbers in
rewritten manifests
.reduce(base.lastSequenceNumber(), Math::min);
```
Because all v1 data files has sequence number 0, the minDataSequenceNumber
will use base.lastSequenceNumber() as it's value. In step 4 this value is 1,
deleteFilterManager.dropDeleteFilesOlderThan(minDataSequenceNumber) will not
drop delete-files in step 3. But in step 5, this value turns to 2, which will
drop delete-files in step 3.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]