SreeramGarlapati opened a new pull request, #2543: URL: https://github.com/apache/iceberg-rust/pull/2543
### Summary - add `RewriteManifestsAction` exposed as `Transaction::rewrite_manifests()` - group live `DataFile` entries by partition tuple within the default partition spec, roll new manifests by `target_size_bytes`, and commit a `Replace` snapshot - preserve `sequence_number`, `file_sequence_number`, and (v3) `first_row_id` via `ManifestWriter::add_existing_file` - carry forward total-`*` summary keys; emit `manifests-created`, `manifests-replaced`, `manifests-kept`, `entries-processed` ### Why `apache/iceberg-rust` had no manifest-compaction primitive. Long-running streaming/append workloads accumulate small manifests, which inflates planning cost. Java ships `BaseRewriteManifests` for this; this PR is the rust analog at the transaction-primitive layer (per the architectural guidance in #1453). ### Scope - format versions: v1, v2, v3 - knob set matches Java parity exactly: `target_size_bytes` (default 8 MiB, mirrors `commit.manifest.target-size-bytes`), plus inherited `snapshot_properties` / `commit_uuid` / `key_metadata` via builder - only the default partition spec is rewritten; manifests bound to other specs and DELETE manifests are kept verbatim - short-circuits to no-op when there's nothing to merge Out of scope (deferrable to follow-ups): `rewrite_if` predicate, `cluster_by`, custom `spec_id` / `staging_location`, `iceberg-datafusion` SQL-procedure layer. ### Tests - 6 inline unit tests (no-current-snapshot error, single-small-manifest no-op, multi-manifest merge preserves sequence numbers on v2, target-size rolls multiple manifests, v3 row-lineage preserved, summary + Replace operation) - `cargo test -p iceberg` (1302 passed, 0 failed) - `cargo clippy -p iceberg --all-targets -- -D warnings` - `cargo fmt --check` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
