SteNicholas opened a new pull request, #7815:
URL: https://github.com/apache/paimon/pull/7815
### Purpose
Support data evolution for `StreamWriteBuilder` by introducing
`StreamTableUpdate` that exposes `update_by_arrow_with_row_id` and
`upsert_by_arrow_with_key` APIs with `commit_identifier` support. Previously,
`StreamWriteBuilder.new_update()` raised
`ValueError("StreamWriteBuilder.new_update() not supported.")`, making
data-evolution table-update operations unavailable in streaming mode.
Key changes:
- `TableUpdate` → `BatchTableUpdate` / `StreamTableUpdate` split: Refactored
`TableUpdate` into a shared base with `_update_by_arrow_with_row_id` and
`_upsert_by_arrow_with_key` internals that accept a `commit_identifier`.
`BatchTableUpdate` delegates to`BATCH_COMMIT_IDENTIFIER` (no commit_identifier
parameter); `StreamTableUpdate` requires an explicit commit_identifier per call
— mirroring the existing `TableWrite` / `TableCommit` batch/stream pattern.
- `TableCommit` cleanup: Moved `_check_committed` / `batch_committed` guard
down from the shared `TableCommit` base into `BatchTableCommit`, since
stream-mode commits are reusable by design. Improved log message and docstrings.
- `TableUpdateByRowId` / `TableUpsertByKey`: Both now accept
`commit_identifier` via constructor instead of hard-coding
`BATCH_COMMIT_IDENTIFIER` internally. `TableUpsertByKey._do_appends` switched
from `BatchTableWrite` to `StreamTableWrite` so that appended commit messages
carry the correct identifier.
- `WriteBuilder`: `StreamWriteBuilder.new_update()` returns
`StreamTableUpdate` instead of raising.
### Tests
- New stream-mode test classes cover `update_by_arrow_with_row_id` and
`upsert_by_arrow_with_key` under streaming, including: single-column update,
multi-column update, partial-row update, partitioned tables, consecutive
commits with correct snapshot identifiers, and concurrent stream commits.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]