viirya commented on PR #2509:
URL: https://github.com/apache/iceberg-rust/pull/2509#issuecomment-4586094643

   Nice catch on the `HashSet`-eats-the-duplicates bug — the fail-fast rewrite 
reads cleanly. Two things before this lands, both building on @blackmwk's open 
comments:
   
   **The batch-internal check probably shouldn't be gated on 
`check_duplicate`.** I agree with @blackmwk's point on `append.rs:88`. The 
`check_duplicate` flag exists to let callers skip the *cross-snapshot* 
validation, which costs manifest I/O. The intra-batch check is pure in-memory 
work with no such cost, and skipping it means `commit()` can silently write a 
manifest that references the same `file_path` twice — i.e. exactly the 
corruption this PR is meant to prevent. I'd run 
`check_no_duplicate_paths_in_batch` unconditionally.
   
   Worth flagging that 
`test_fast_append_intra_batch_duplicate_check_can_be_disabled` currently *locks 
in* that behavior — it asserts two identical paths commit successfully when the 
flag is off. If the check becomes unconditional, that test's assertion should 
flip to expecting an error.
   
   **Scope is `FastAppendAction` only.** This fixes the fast-append path; it'd 
be worth a quick follow-up to confirm `MergeAppend` / overwrite don't share the 
same latent issue (they may route through different validation). Not blocking 
this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to