SreeramGarlapati opened a new issue, #2556: URL: https://github.com/apache/iceberg-rust/issues/2556
### Problem `RewriteManifestsAction` (added in #2543) intentionally does **not** plug into `SnapshotProducer`. The producer path has four cross-cutting assumptions that block a clean integration today: 1. **`update_snapshot_summaries` rejects `Operation::Replace`** — `crates/iceberg/src/spec/snapshot_summary.rs:337-346` returns `ErrorKind::DataInvalid` for any operation other than `Append` / `Overwrite` / `Delete`. `SnapshotProducer::commit()` calls `update_snapshot_summaries` unconditionally, so the producer cannot emit a `Replace` snapshot at all today. 2. **`ManifestProcess::process_manifests` is synchronous** — `crates/iceberg/src/transaction/snapshot.rs:119-125`. Rewrite-manifests must write *new* manifests (async I/O via `ManifestWriter`), then return them as the post-process output. There is no async seam. 3. **Summary keys are driven by `added_data_files` only** — `crates/iceberg/src/transaction/snapshot.rs:373-379`. For rewrite-manifests `added_data_files` is empty by definition, so no `total-*` carry-forward happens, and there is no hook for the rewrite-specific summary keys (`manifests-created`, `manifests-replaced`, `manifests-kept`, `entries-processed`). 4. **`manifest_file()` shape mismatch** — the producer's `manifest_file()` assumes "if `added_data_files` non-empty, write one manifest from those files." Rewrite-manifests produces an N-to-M rewrite of *existing* manifest entries; the shape is fundamentally different. ### Scope Refactor once the underlying primitives land: - Add `Operation::Replace` (and `Operation::Delete`) to `update_snapshot_summaries`' allow-list, with summary-key carry-forward semantics that mirror Java `BaseRewriteManifests`. - Make `ManifestProcess::process_manifests` async (or add a parallel async hook) so producers can emit manifests written via `ManifestWriter`. - Decouple summary-key emission from `added_data_files` so rewrite-style ops can declare their own summary additions. - Migrate `RewriteManifestsAction` onto the unified path and delete the duplicated commit logic. Out of scope for this issue: the `rewrite_if` predicate, `cluster_by`, custom `spec_id`, custom `staging_location`, and the `iceberg-datafusion` SQL-procedure layer — all separately deferrable from #2543. ### References - Java reference: `apache/iceberg` `BaseRewriteManifests` + `SnapshotProducer` - Proposal: #1453 (closed) - Implementation: #2543 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
