[GitHub] [iceberg] aokolnychyi commented on a change in pull request #3945: Core: Use changed partition to validate file confilct

GitBox Tue, 25 Jan 2022 08:45:10 -0800


aokolnychyi commented on a change in pull request #3945:
URL: https://github.com/apache/iceberg/pull/3945#discussion_r791185127




##########
File path: core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
##########
@@ -89,6 +89,7 @@
   private final List<ManifestFile> rewrittenAppendManifests = 
Lists.newArrayList();
   private final SnapshotSummary.Builder addedFilesSummary = 
SnapshotSummary.builder();
   private final SnapshotSummary.Builder appendedManifestsSummary = 
SnapshotSummary.builder();
+  private final Set<StructLike> changedPartitions = Sets.newHashSet();

Review comment:
       That's a very good point.
   
   We should either use `PartitionSet` if we need to have partitions per spec 
or just use `StructLikeSet`. We cannot use a regular set as we may get 
arbitrary StructLike implementations. Most likely, we will need `PartitionSet`. 
Can we add a test with multiple specs?

##########
File path: core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
##########
@@ -89,6 +89,7 @@
   private final List<ManifestFile> rewrittenAppendManifests = 
Lists.newArrayList();
   private final SnapshotSummary.Builder addedFilesSummary = 
SnapshotSummary.builder();
   private final SnapshotSummary.Builder appendedManifestsSummary = 
SnapshotSummary.builder();
+  private final Set<StructLike> changedPartitions = Sets.newHashSet();

Review comment:
       That's a very good point.
   
   We should either use `PartitionSet` if we need to have partitions per spec 
or just use `StructLikeSet`. We cannot use a regular set as we may get 
arbitrary StructLike implementations.
   
   Most likely, we will need `PartitionSet`. Can we add a test with multiple 
specs?

##########
File path: core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
##########
@@ -89,6 +89,7 @@
   private final List<ManifestFile> rewrittenAppendManifests = 
Lists.newArrayList();
   private final SnapshotSummary.Builder addedFilesSummary = 
SnapshotSummary.builder();
   private final SnapshotSummary.Builder appendedManifestsSummary = 
SnapshotSummary.builder();
+  private final Set<StructLike> changedPartitions = Sets.newHashSet();

Review comment:
       I think we should even add a style rule to prohibit regular sets/maps 
for `StructLike` in a separate PR.

##########
File path: core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
##########
@@ -89,6 +89,7 @@
   private final List<ManifestFile> rewrittenAppendManifests = 
Lists.newArrayList();
   private final SnapshotSummary.Builder addedFilesSummary = 
SnapshotSummary.builder();
   private final SnapshotSummary.Builder appendedManifestsSummary = 
SnapshotSummary.builder();
+  private final Set<StructLike> changedPartitions = Sets.newHashSet();

Review comment:
       That's a very good point.
   
   We should either use `PartitionSet` if we need to have partitions per spec 
or just use `StructLikeSet`. We cannot use a regular set as we may get 
arbitrary `StructLike` implementations.
   
   Most likely, we will need `PartitionSet`. Can we add a test with multiple 
specs?

##########
File path: core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
##########
@@ -182,20 +183,23 @@ protected void dropPartition(int specId, StructLike 
partition) {
     // dropping the data in a partition also drops all deletes in the partition
     filterManager.dropPartition(specId, partition);
     deleteFilterManager.dropPartition(specId, partition);
+    changedPartitions.add(partition);

Review comment:
       The spec ID should matter, right?

##########
File path: 
core/src/test/java/org/apache/iceberg/TestOverwriteWithValidation.java
##########
@@ -333,6 +333,31 @@ public void 
testOverwriteCompatibleAdditionStrictValidated() {
         committedSnapshotId, table.currentSnapshot().snapshotId());
   }
 
+  @Test
+  public void testOverwriteCompatibleAdditionStrictValidatedNoConflict() {

Review comment:
       Well, I am not sure I understand this completely.
   
   Under the current implementation, this test would throw a validation 
exception as the conflict detection filter is `true`. That means any concurrent 
modification should be considered a conflict. I think that's the correct 
behavior. It will happen if a row-level operation did not have a predicate that 
could be pushed down. For instance, `UPDATE t SET col = 1`. Any append should 
be considered a conflict in this case. Otherwise, suppose we had only files in 
partition B and then got an UPDATE statement without a condition. While 
executing the UPDATE statement, someone concurrently added new files to 
partition A. The UPDATE will be deleting files from partition B but the 
concurrently added files to partition A should still fail the operation.
   
   @coolderli, will adding a partition predicate to you MERGE statement solve 
the issue? You can provide a predicate in the ON condition and that will be 
used as the conflict detection filter.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] aokolnychyi commented on a change in pull request #3945: Core: Use changed partition to validate file confilct

Reply via email to