SreeramGarlapati opened a new issue, #2556:
URL: https://github.com/apache/iceberg-rust/issues/2556

   ### Problem
   
   `RewriteManifestsAction` (added in #2543) intentionally does **not** plug 
into `SnapshotProducer`. The producer path has four cross-cutting assumptions 
that block a clean integration today:
   
   1. **`update_snapshot_summaries` rejects `Operation::Replace`** — 
`crates/iceberg/src/spec/snapshot_summary.rs:337-346` returns 
`ErrorKind::DataInvalid` for any operation other than `Append` / `Overwrite` / 
`Delete`. `SnapshotProducer::commit()` calls `update_snapshot_summaries` 
unconditionally, so the producer cannot emit a `Replace` snapshot at all today.
   2. **`ManifestProcess::process_manifests` is synchronous** — 
`crates/iceberg/src/transaction/snapshot.rs:119-125`. Rewrite-manifests must 
write *new* manifests (async I/O via `ManifestWriter`), then return them as the 
post-process output. There is no async seam.
   3. **Summary keys are driven by `added_data_files` only** — 
`crates/iceberg/src/transaction/snapshot.rs:373-379`. For rewrite-manifests 
`added_data_files` is empty by definition, so no `total-*` carry-forward 
happens, and there is no hook for the rewrite-specific summary keys 
(`manifests-created`, `manifests-replaced`, `manifests-kept`, 
`entries-processed`).
   4. **`manifest_file()` shape mismatch** — the producer's `manifest_file()` 
assumes "if `added_data_files` non-empty, write one manifest from those files." 
Rewrite-manifests produces an N-to-M rewrite of *existing* manifest entries; 
the shape is fundamentally different.
   
   ### Scope
   
   Refactor once the underlying primitives land:
   - Add `Operation::Replace` (and `Operation::Delete`) to 
`update_snapshot_summaries`' allow-list, with summary-key carry-forward 
semantics that mirror Java `BaseRewriteManifests`.
   - Make `ManifestProcess::process_manifests` async (or add a parallel async 
hook) so producers can emit manifests written via `ManifestWriter`.
   - Decouple summary-key emission from `added_data_files` so rewrite-style ops 
can declare their own summary additions.
   - Migrate `RewriteManifestsAction` onto the unified path and delete the 
duplicated commit logic.
   
   Out of scope for this issue: the `rewrite_if` predicate, `cluster_by`, 
custom `spec_id`, custom `staging_location`, and the `iceberg-datafusion` 
SQL-procedure layer — all separately deferrable from #2543.
   
   ### References
   
   - Java reference: `apache/iceberg` `BaseRewriteManifests` + 
`SnapshotProducer`
   - Proposal: #1453 (closed)
   - Implementation: #2543


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to