Reo-LEI commented on a change in pull request #3480:
URL: https://github.com/apache/iceberg/pull/3480#discussion_r744413376
##########
File path: core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
##########
@@ -309,10 +312,11 @@ protected void
validateNoNewDeletesForDataFiles(TableMetadata base, Long startin
* @param dataFilter a data filter
* @param dataFiles data files to validate have no new row deletes
* @param caseSensitive whether expression binding should be case-sensitive
+ * @param ignoreEqualityDeletes whether equality deletes should be ignored
in validation
*/
protected void validateNoNewDeletesForDataFiles(TableMetadata base, Long
startingSnapshotId,
Expression dataFilter,
Iterable<DataFile> dataFiles,
- boolean caseSensitive) {
+ boolean caseSensitive,
boolean ignoreEqualityDeletes) {
Review comment:
Maybe we should add some comment to indicate why we could ignore the
equality deletes and when we should ignore or not.
##########
File path: core/src/main/java/org/apache/iceberg/ManifestWriter.java
##########
@@ -149,13 +150,17 @@ public long length() {
return writer.length();
}
+ void useSequenceNumber(long sequenceNumber) {
+ this.manifestSequenceNumber = sequenceNumber;
+ }
+
public ManifestFile toManifestFile() {
Preconditions.checkState(closed, "Cannot build ManifestFile, writer is not
closed");
// if the minSequenceNumber is null, then no manifests with a sequence
number have been written, so the min
// sequence number is the one that will be assigned when this is
committed. pass UNASSIGNED_SEQ to inherit it.
long minSeqNumber = minSequenceNumber != null ? minSequenceNumber :
UNASSIGNED_SEQ;
return new GenericManifestFile(file.location(), writer.length(), specId,
content(),
- UNASSIGNED_SEQ, minSeqNumber, snapshotId,
+ manifestSequenceNumber, minSeqNumber, snapshotId,
Review comment:
I think there is some different between #3204 and this PR. In this PR,
the specific seqNum will be set to manifest file, and manifest list file will
still got a new seqNum when we commit the snapshot of rewrite. In the result,
we could got the incremental seqNum from snapshot and got spercific seqNum form
data files because data file will inherit seqNum from menifest file but not
from snapshot. Then we can use the seqNum of the data file to verify whether
there are deleted files to modify the rewritten data files.
But in #3204, the seqNum of snapshot will override by specific seqNum, and
we will got two snapshot which have same seqNum. So that, the monotonicity of
snapshot seqNum will be break and we can not recognize which snapshot is the
new one because they have same seqNum.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]