Reo-LEI commented on a change in pull request #3480:
URL: https://github.com/apache/iceberg/pull/3480#discussion_r744413376



##########
File path: core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
##########
@@ -309,10 +312,11 @@ protected void 
validateNoNewDeletesForDataFiles(TableMetadata base, Long startin
    * @param dataFilter a data filter
    * @param dataFiles data files to validate have no new row deletes
    * @param caseSensitive whether expression binding should be case-sensitive
+   * @param ignoreEqualityDeletes whether equality deletes should be ignored 
in validation
    */
   protected void validateNoNewDeletesForDataFiles(TableMetadata base, Long 
startingSnapshotId,
                                                   Expression dataFilter, 
Iterable<DataFile> dataFiles,
-                                                  boolean caseSensitive) {
+                                                  boolean caseSensitive, 
boolean ignoreEqualityDeletes) {

Review comment:
       Maybe we should add some comment to indicate why we could ignore the 
equality deletes and when we should ignore or not.

##########
File path: core/src/main/java/org/apache/iceberg/ManifestWriter.java
##########
@@ -149,13 +150,17 @@ public long length() {
     return writer.length();
   }
 
+  void useSequenceNumber(long sequenceNumber) {
+    this.manifestSequenceNumber = sequenceNumber;
+  }
+
   public ManifestFile toManifestFile() {
     Preconditions.checkState(closed, "Cannot build ManifestFile, writer is not 
closed");
     // if the minSequenceNumber is null, then no manifests with a sequence 
number have been written, so the min
     // sequence number is the one that will be assigned when this is 
committed. pass UNASSIGNED_SEQ to inherit it.
     long minSeqNumber = minSequenceNumber != null ? minSequenceNumber : 
UNASSIGNED_SEQ;
     return new GenericManifestFile(file.location(), writer.length(), specId, 
content(),
-        UNASSIGNED_SEQ, minSeqNumber, snapshotId,
+        manifestSequenceNumber, minSeqNumber, snapshotId,

Review comment:
       I think there is some different between #3204 and this PR. In this PR, 
the specific seqNum will be set to manifest file, and manifest list file will 
still got a new seqNum when we commit the snapshot of rewrite. In the result, 
we could got the incremental seqNum from snapshot and got spercific seqNum form 
data files because data file will inherit seqNum from menifest file but not 
from snapshot. Then we can use the seqNum of the data file to verify whether 
there are deleted files to modify the rewritten data files.
   
   But in #3204, the seqNum of snapshot will override by specific seqNum, and 
we will got two snapshot which have same seqNum. So that, the monotonicity of 
snapshot seqNum will be break and we can not recognize which snapshot is the 
new one because they have same seqNum.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to