Li0k opened a new pull request, #1853:
URL: https://github.com/apache/iceberg-rust/pull/1853

   ## Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and 
enhancements and this helps us generate change logs for our releases. You can 
link an issue to this PR using the GitHub syntax. For example `Closes #123` 
indicates that this PR will close issue #123.
   -->
   
   - Closes #.
   
   ## What changes are included in this PR?
   
   
   
   ## Summary
   Refactor `SnapshotProducer` validation methods to use internal state instead 
of requiring redundant parameters.
   
   ## Problem
   I've noticed that while the current **SnapshotProducer** API design already 
equips SnapshotProducer with all necessary state, the current invocations still 
redundantly pass parameters externally. I believe this could lead to some 
issues.
   
   1. **Data mismatch risk**: Callers could pass different data than what's 
stored in `SnapshotProducer`, leading to validating one set of files but 
committing another
   2. **API complexity**: As more validations are added (e.g., delete files, 
file existence checks), each method would require additional parameters, making 
the API harder to use
   3. **Redundant passing**: The same data that was already provided during 
construction has to be passed again
   
   ## Changes
   - Modified `validate_added_data_files()` and `validate_duplicate_files()` to 
operate on `self.added_data_files` directly
   - Updated `FastAppendAction::commit()` to call validation methods without 
passing `added_data_files` parameter
   
   ## Motivation
   Previously, `added_data_files` was passed as a parameter to validation 
methods even though it was already stored in `SnapshotProducer`:
   
   ```rust
   // Before
   snapshot_producer.validate_added_data_files(&self.added_data_files)?;
   
   // After  
   snapshot_producer.validate_added_data_files()?;
   ```
   
   ## Benefits
   1. Better encapsulation - validation operates on object's own state
   2. Safer API - eliminates possibility of data mismatch
   3. Simpler interface - no redundant parameters needed
   
   ## Discussion
   
   Since **SnapshotProducer** already holds all necessary state, can we further 
refine validation by performing it during the **new** function's execution to 
improve data consistency and encapsulation?
   
   <!--
   Provide a summary of the modifications in this PR. List the main changes 
such as new features, bug fixes, refactoring, or any other updates.
   -->
   
   ## Are these changes tested?
   
   <!--
   Specify what test covers (unit test, integration test, etc.).
   
   If tests are not included in your PR, please explain why (for example, are 
they covered by existing tests)?
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to