TheR1sing3un opened a new pull request, #7873:
URL: https://github.com/apache/paimon/pull/7873

   ## Summary
   
   Lay the protocol-level groundwork for upcoming compaction work in pypaimon 
by aligning `CommitMessage` with Java's `CommitMessageImpl` shape and adding a 
JSON-safe wire format for cross-process transport.
   
   **No observable behavior change today** — read / write / commit paths keep 
producing the same snapshots. This is foundation for the follow-up PRs that 
ship the compaction module, append-only compaction, PK LSM compaction, and Ray 
distributed executor.
   
   Split from #7771 — originally bundled all of Phase 1-4 into one ~5000-line 
PR; this is the first of 6 smaller, focused PRs.
   
   ## What's in this PR
   
   **Structural changes**
   
   - New `DataIncrement` (write side) and `CompactIncrement` (compaction side) 
value objects, direct ports of `org.apache.paimon.io.DataIncrement` and 
`CompactIncrement`. Each holds `(new_files, deleted_files, changelog_files, 
new_index_files, deleted_index_files)` so future deletion-vector / changelog 
work has an unambiguous slot.
   - `CommitMessage` refactored to `(partition, bucket, total_buckets, 
data_increment, compact_increment, check_from_snapshot)`. Convenience 
properties (`new_files`, `compact_before`, `compact_after`, …) preserve 
read-site ergonomics.
   - `FileStoreCommit` emits ADD entries for `compact_after`, DELETE entries 
for `compact_before`, and auto-selects `commit_kind=COMPACT` when a message 
carries only compact increments. A dedicated `commit_compact()` helper enforces 
COMPACT-only semantics with no row-id assignment.
   - `FileStoreWrite` / `TableUpdate` construct `CommitMessage` via 
`DataIncrement` on the existing write path — no behavior change for current 
callers.
   
   **DataFileMeta serialization**
   
   - `to_dict` / `from_dict` round-trip with tagged-value encoding for `bytes`, 
`Decimal`, `datetime`, `date`, `time`, and `Timestamp` so file metas can ship 
through JSON-only transports (Ray task payloads later).
   - Public `encode_value` / `decode_value` helpers reused by 
`CommitMessage.partition` (DATE / DECIMAL / bytes / Timestamp partitions).
   - Tolerates manifest-side `BinaryRow` (lazy-decoded) and pyarrow Array-like 
`null_counts` so round-tripping a freshly-produced file meta doesn't fail.
   
   **CommitMessageSerializer**
   
   - VERSION=1 wire format covering full `DataIncrement` + `CompactIncrement` 
shape (including `IndexFileMeta` identity fields). `dv_ranges` / 
`global_index_meta` will be wired up alongside deletion-vector phases.
   
   ## Test plan
   
   - [x] New `commit_message_serializer_test`: round-trip `CommitMessage` with 
`DataIncrement` / `CompactIncrement` / index files / non-JSON-native partition 
tuples (DATE, Decimal, bytes, Timestamp); IndexFileMeta round-trip; 
unknown-version rejection.
   - [x] New `file_store_commit_compact_test`: protocol-level coverage of 
`compact_before` → DELETE entry, `compact_after` → ADD entry, auto-COMPACT kind 
selection. Full e2e lands when the compactor lands (PR2+).
   - [x] Existing `file_store_commit_test` / `partition_predicate_test` / 
`table_commit_test` updated to construct `CommitMessage` via `DataIncrement` 
instead of the legacy `new_files=` kwarg (signature change adapter, no 
production behavior).
   
   ## Note: no consumers yet
   
   `CompactIncrement` / `commit_compact` / `CommitMessageSerializer` introduced 
here have no callers in this PR — they're the foundation. The next 5 split PRs 
build on top:
   
   1. **This PR** — commit protocol foundation
   2. Append-only compaction module + LocalExecutor
   3. MergeFunction abstraction (read-path refactor)
   4. Levels + UniversalCompaction strategy
   5. MergeTree compaction integration (PK e2e)
   6. Ray distributed executor
   
   Each follow-up will land separately for incremental review.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to