mitchellciupak opened a new pull request, #2025:
URL: https://github.com/apache/iceberg-rust/pull/2025

   ## Which issue does this PR close?
   
   ### Purpose
   
   This PR does not close any existing issues. It addresses an optimization 
opportunity in the fast append workflow.
   
   ### Use Case
   
   I need to validate data files before committing them to the table. 
Currently, validate_added_data_files() is called internally during commit(), 
which means validation occurs on every commit attempt, including retries.
   
   ### Enhancement
   
   By exposing validate_added_data_files() as a public method, I can perform 
validation once before attempting a commit. This allows for commit retries 
without re-running validation, reducing overhead in retry scenarios.
   
   This is a performance optimization that provides more control over the 
validation/commit lifecycle.
   
   ## What changes are included in this PR?
   
   This commit adds an option to the FastAppendAction to disable the validation 
step `snapshot_producer.validate_added_data_files()` during commits. This is 
similar to the option to disable `snapshot_producer.validate_duplicate_files()`
   
   - Adds an option/flag to FastAppendAction to perform or disable validation 
of added data files when appending.
   - Wiring the option through relevant code paths in `append.rs`.
   
   The change is implemented in `crates/iceberg/src/transaction/append.rs`.
   ## Are these changes tested?
   
   These changes have been manually tested outside the test framework. I 
noticed that the existing `with_check_duplicate()` method also lacks test 
coverage. If helpful, I can add tests for both `with_check_duplicate()` and the 
new `validate_added_data_files()` method in this PR. I'm not sure if either 
change was small enough to be considered out of scope for the project's test 
strategy.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to