blackmwk commented on code in PR #2509:
URL: https://github.com/apache/iceberg-rust/pull/2509#discussion_r3327914391


##########
crates/iceberg/src/transaction/snapshot.rs:
##########
@@ -508,3 +508,19 @@ impl<'a> SnapshotProducer<'a> {
         Ok(ActionCommit::new(updates, requirements))
     }
 }
+
+pub(super) fn check_no_duplicate_paths_in_batch(files: &[DataFile]) -> 
Result<()> {

Review Comment:
   Why putting it in `SnapshotProducer`? I think we should just put it in 
`FastAppendAction`.



##########
crates/iceberg/src/transaction/append.rs:
##########
@@ -84,6 +85,10 @@ impl FastAppendAction {
 #[async_trait]
 impl TransactionAction for FastAppendAction {
     async fn commit(self: Arc<Self>, table: &Table) -> Result<ActionCommit> {
+        if self.check_duplicate {

Review Comment:
   I think we should always add this check. The `check_duplicate` flag is used 
to save checking against existing data files in snapshot since it may introduce 
extra io. This is not the case for the newly added data files, and if we don't 
do the check, it's a bug.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to