aokolnychyi commented on a change in pull request #3945:
URL: https://github.com/apache/iceberg/pull/3945#discussion_r791185127
##########
File path: core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
##########
@@ -89,6 +89,7 @@
private final List<ManifestFile> rewrittenAppendManifests =
Lists.newArrayList();
private final SnapshotSummary.Builder addedFilesSummary =
SnapshotSummary.builder();
private final SnapshotSummary.Builder appendedManifestsSummary =
SnapshotSummary.builder();
+ private final Set<StructLike> changedPartitions = Sets.newHashSet();
Review comment:
That's a very good point.
We should either use `PartitionSet` if we need to have partitions per spec
or just use `StructLikeSet`. We cannot use a regular set as we may get
arbitrary StructLike implementations. Most likely, we will need `PartitionSet`.
Can we add a test with multiple specs?
##########
File path: core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
##########
@@ -89,6 +89,7 @@
private final List<ManifestFile> rewrittenAppendManifests =
Lists.newArrayList();
private final SnapshotSummary.Builder addedFilesSummary =
SnapshotSummary.builder();
private final SnapshotSummary.Builder appendedManifestsSummary =
SnapshotSummary.builder();
+ private final Set<StructLike> changedPartitions = Sets.newHashSet();
Review comment:
That's a very good point.
We should either use `PartitionSet` if we need to have partitions per spec
or just use `StructLikeSet`. We cannot use a regular set as we may get
arbitrary StructLike implementations.
Most likely, we will need `PartitionSet`. Can we add a test with multiple
specs?
##########
File path: core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
##########
@@ -89,6 +89,7 @@
private final List<ManifestFile> rewrittenAppendManifests =
Lists.newArrayList();
private final SnapshotSummary.Builder addedFilesSummary =
SnapshotSummary.builder();
private final SnapshotSummary.Builder appendedManifestsSummary =
SnapshotSummary.builder();
+ private final Set<StructLike> changedPartitions = Sets.newHashSet();
Review comment:
I think we should even add a style rule to prohibit regular sets/maps
for `StructLike` in a separate PR.
##########
File path: core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
##########
@@ -89,6 +89,7 @@
private final List<ManifestFile> rewrittenAppendManifests =
Lists.newArrayList();
private final SnapshotSummary.Builder addedFilesSummary =
SnapshotSummary.builder();
private final SnapshotSummary.Builder appendedManifestsSummary =
SnapshotSummary.builder();
+ private final Set<StructLike> changedPartitions = Sets.newHashSet();
Review comment:
That's a very good point.
We should either use `PartitionSet` if we need to have partitions per spec
or just use `StructLikeSet`. We cannot use a regular set as we may get
arbitrary `StructLike` implementations.
Most likely, we will need `PartitionSet`. Can we add a test with multiple
specs?
##########
File path: core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
##########
@@ -182,20 +183,23 @@ protected void dropPartition(int specId, StructLike
partition) {
// dropping the data in a partition also drops all deletes in the partition
filterManager.dropPartition(specId, partition);
deleteFilterManager.dropPartition(specId, partition);
+ changedPartitions.add(partition);
Review comment:
The spec ID should matter, right?
##########
File path:
core/src/test/java/org/apache/iceberg/TestOverwriteWithValidation.java
##########
@@ -333,6 +333,31 @@ public void
testOverwriteCompatibleAdditionStrictValidated() {
committedSnapshotId, table.currentSnapshot().snapshotId());
}
+ @Test
+ public void testOverwriteCompatibleAdditionStrictValidatedNoConflict() {
Review comment:
Well, I am not sure I understand this completely.
Under the current implementation, this test would throw a validation
exception as the conflict detection filter is `true`. That means any concurrent
modification should be considered a conflict. I think that's the correct
behavior. It will happen if a row-level operation did not have a predicate that
could be pushed down. For instance, `UPDATE t SET col = 1`. Any append should
be considered a conflict in this case. Otherwise, suppose we had only files in
partition B and then got an UPDATE statement without a condition. While
executing the UPDATE statement, someone concurrently added new files to
partition A. The UPDATE will be deleting files from partition B but the
concurrently added files to partition A should still fail the operation.
@coolderli, will adding a partition predicate to you MERGE statement solve
the issue? You can provide a predicate in the ON condition and that will be
used as the conflict detection filter.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]