vanliu-tx commented on code in PR #4514:
URL: https://github.com/apache/iceberg/pull/4514#discussion_r847919934


##########
core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java:
##########
@@ -690,7 +690,7 @@ public List<ManifestFile> apply(TableMetadata base) {
         base.schema(), current != null ? current.dataManifests() : null);
     long minDataSequenceNumber = filtered.stream()
         .map(ManifestFile::minSequenceNumber)
-        .filter(seq -> seq > 0) // filter out unassigned sequence numbers in 
rewritten manifests
+        .filter(seq -> seq >= 0) // filter out unassigned sequence numbers in 
rewritten manifests

Review Comment:
   Assuming the following steps
   1. create v1 table and append some data files.
   2. upgrade to v2 and commit.
   3. append some row-delta delete and commit.
   4. remove one data file and commit.
   5. remove another data file and commit.
   
   ```
       long minDataSequenceNumber = filtered.stream()
           .map(ManifestFile::minSequenceNumber)
           .filter(seq -> seq > 0) // filter out unassigned sequence numbers in 
rewritten manifests
           .reduce(base.lastSequenceNumber(), Math::min);
   ```
   Because all v1 data files has sequence number 0, the minDataSequenceNumber 
will use base.lastSequenceNumber() as it's value. In step 4 this value is 1, 
deleteFilterManager.dropDeleteFilesOlderThan(minDataSequenceNumber) will not 
drop delete-files in step 3. But in step 5, this value turns to 2, which will 
drop delete-files in step 3.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to