QuakeWang opened a new pull request, #318:
URL: https://github.com/apache/paimon-rust/pull/318

   <!--
   Thank you very much for contributing to Paimon Rust - we are happy that you 
want to help us improve it. To help the community review your contribution in 
the best possible way, please go through the checklist below, which will get 
the contribution into a shape in which it can be best reviewed.
   
   ## Contribution Checklist
   
     - Make sure that the pull request corresponds to a [GitHub 
issue](https://github.com/apache/paimon-rust/issues). Exceptions are made for 
typos in documentation or comments, which need no issue.
   
     - Fill out the template below to describe the changes contributed by the 
pull request. That will give reviewers the context they need to do the review.
   
     - Make sure that the change passes the automated tests, i.e., `cargo test` 
passes.
   
     - Each pull request should address only one issue, not mix up code from 
multiple issues.
   
   **(The sections below can be removed for hotfixes or typos)**
   -->
   
   ### Purpose
   
   <!-- Linking this pull request to the issue -->
   
   Linked issue: close #255
   
   This PR supports `changelog-producer=input` for primary-key table writes in 
the scoped Rust write path.
   
   The implementation double-writes primary-key input rows into changelog files 
while keeping normal table data files deduplicated, then commits changelog 
files through separate changelog manifest metadata.
   
   <!-- What is the purpose of the change -->
   
   ### Brief change log
   
   <!-- Please describe the changes made in this pull request and explain how 
they address the issue -->
   
     - Add typed parsing for `changelog-producer` and changelog file options in 
`CoreOptions`.
     - Add `PreparedFiles` and propagate changelog files separately from normal 
data files.
     - Teach `KeyValueFileWriter` to write input changelog files from the full 
sorted input rows, while normal data files still use merge-engine selected rows.
     - Compute changelog `DataFileMeta` from changelog rows, including row 
count, key stats, sequence range, and retract row count.
     - Add `CommitMessage::new_changelog_files` and commit changelog entries 
into a separate changelog manifest list.
     - Populate snapshot `changelogManifestList` and `changelogRecordCount`.
     - Reject unsupported combinations early, including non-PK tables, 
non-deduplicate merge engines, cross-partition dynamic bucket, `bucket=-2`, 
`rowkind.field`, unsupported changelog producers, and overwrite with changelog 
files.
     - Preserve existing normal data manifest, index manifest, and table record 
count behavior.
   
   ### Tests
   
   <!-- List unit tests or integration cases to verify this change -->
   
     - `cargo fmt --all -- --check`
     - `cargo clippy -p paimon --all-targets -- -D warnings`
     - `cargo test -p paimon changelog`
     - `cargo test -p paimon`
   
   ### API and Format
   
   <!-- Does this change affect API or storage format -->
   
   ### Documentation
   
   <!-- Does this change introduce a new feature or require documentation 
updates -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to