SteNicholas opened a new pull request, #7815:
URL: https://github.com/apache/paimon/pull/7815

   ### Purpose                                                                  
                                                                                
                                                                                
         
                                                                                
                                                                                
                                                                                
       Support data evolution for `StreamWriteBuilder` by introducing 
`StreamTableUpdate` that exposes `update_by_arrow_with_row_id` and 
`upsert_by_arrow_with_key` APIs with `commit_identifier` support. Previously, 
`StreamWriteBuilder.new_update()` raised      
     `ValueError("StreamWriteBuilder.new_update() not supported.")`, making 
data-evolution table-update operations unavailable in streaming mode.           
                                                                                
             
                                                                                
                                                                                
                                                                                
       
   Key changes:              
                                     
   - `TableUpdate` → `BatchTableUpdate` / `StreamTableUpdate` split: Refactored 
`TableUpdate` into a shared base with `_update_by_arrow_with_row_id` and 
`_upsert_by_arrow_with_key` internals that accept a `commit_identifier`. 
`BatchTableUpdate` delegates to`BATCH_COMMIT_IDENTIFIER` (no commit_identifier 
parameter); `StreamTableUpdate` requires an explicit commit_identifier per call 
— mirroring the existing `TableWrite` / `TableCommit` batch/stream pattern.
   - `TableCommit` cleanup: Moved `_check_committed` / `batch_committed` guard 
down from the shared `TableCommit` base into `BatchTableCommit`, since 
stream-mode commits are reusable by design. Improved log message and docstrings.
   - `TableUpdateByRowId` / `TableUpsertByKey`: Both now accept 
`commit_identifier` via constructor instead of hard-coding 
`BATCH_COMMIT_IDENTIFIER` internally. `TableUpsertByKey._do_appends` switched 
from `BatchTableWrite` to `StreamTableWrite` so that appended commit messages 
carry the correct identifier.
   - `WriteBuilder`: `StreamWriteBuilder.new_update()` returns 
`StreamTableUpdate` instead of raising.                                         
                                                 
   
   ### Tests
   
   - New stream-mode test classes cover `update_by_arrow_with_row_id` and 
`upsert_by_arrow_with_key` under streaming, including: single-column update, 
multi-column update, partial-row update, partitioned tables, consecutive 
commits with correct snapshot identifiers, and concurrent stream commits.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to