ayush-san commented on a change in pull request #3103:
URL: https://github.com/apache/iceberg/pull/3103#discussion_r718147313



##########
File path: 
flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java
##########
@@ -283,6 +287,7 @@ private void commitDeltaTxn(NavigableMap<Long, WriteResult> 
pendingResults, Stri
         // merged one will lead to the incorrect delete semantic.
         WriteResult result = e.getValue();
         RowDelta rowDelta = table.newRowDelta()
+            .validateFromSnapshot(lastCommittedSnapshotId)

Review comment:
       @Reo-LEI Can you please help me understand how the #2867 help in solving 
this [validation 
error](https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java#L435)
 
   
   I agree with you that we can handle the `lastCommittedSnapshotId` in a 
separate PR and get this reviewed because this will really speed up the commit 
time which increases with time. I have seen my flink job checkpoint time 
increases to 10-15mins from 100-200ms 

##########
File path: 
flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java
##########
@@ -283,6 +287,7 @@ private void commitDeltaTxn(NavigableMap<Long, WriteResult> 
pendingResults, Stri
         // merged one will lead to the incorrect delete semantic.
         WriteResult result = e.getValue();
         RowDelta rowDelta = table.newRowDelta()
+            .validateFromSnapshot(lastCommittedSnapshotId)

Review comment:
       Yes, but with your PR we can run the snapshot expire task with the flink 
job running since you are updating the lastCommittedSnapshotId. The only case 
now left is that when we start the flink job from a checkpoint, we will 
encounter the same problem. 

##########
File path: 
flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java
##########
@@ -283,6 +287,7 @@ private void commitDeltaTxn(NavigableMap<Long, WriteResult> 
pendingResults, Stri
         // merged one will lead to the incorrect delete semantic.
         WriteResult result = e.getValue();
         RowDelta rowDelta = table.newRowDelta()
+            .validateFromSnapshot(lastCommittedSnapshotId)

Review comment:
       Yes, but with your PR we can run the snapshot expire task with the flink 
job running since you are updating the lastCommittedSnapshotId. The only case 
now left is that when we start the flink job from a checkpoint, we will 
encounter the same problem. 
   
   But if are doing that for one case, we can mimic the same for case when we 
restore the job from a checkpoint. Anyways we can tackle this in a separate PR 
and discuss it with @rdblue and @openinx 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to