laskoviymishka opened a new issue, #1215: URL: https://github.com/apache/iceberg-go/issues/1215
## Goal Add `Transaction.DynamicPartitionOverwrite` — the iceberg-go equivalent of Java's `ReplacePartitions` (`BaseReplacePartitions`) and PyIceberg's `Transaction.dynamic_partition_overwrite`. Given an Arrow table, it detects the partitions present in the incoming data, atomically deletes the existing data in exactly those partitions, and appends the new data, leaving untouched partitions alone. ## Background A first attempt was made in #482, but it predates the partitioned-write and copy-on-write-overwrite machinery that has since landed, so most of that PR is now redundant: - Partitioned writes are native — `recordsToDataFiles` routes to the partitioned fanout / clustered writers. - Overwrite-by-filter already exists — `Transaction.Overwrite`, `performCopyOnWriteDeletion`, `mergeOverwrite`. - Transform-aware predicate projection exists — `Transform.Project` — which removes the identity-transform-only limitation of the original attempt. What remains is small and composable: derive a partition-matching predicate from the written data files, then drive the existing overwrite path. Landing it in reviewable slices rather than one drop. ## Scope (decomposable across PRs) - **Phase 1 — Partition-match predicate.** A transform-aware helper that turns a set of touched partition tuples plus the partition spec into a `BooleanExpression` selecting exactly those partitions. Standalone and unit-tested, no transaction wiring. - **Phase 2 — `DynamicPartitionOverwrite` API.** Compose the partitioned write, touched-partition collection via `DataFile.Partition()`, the Phase 1 predicate, and the existing copy-on-write overwrite. Resolve the deletion mechanism for non-identity transforms. Keep the unpartitioned and empty-table guards. - **Phase 3 — Happy-path & interop tests.** Multi-partition, null partition, copy-on-write vs merge-on-read, and a Spark round-trip for cross-engine parity. - **Phase 4 (optional) — Docs & example/CLI exposure.** ## Parity references - Java: `org.apache.iceberg.BaseReplacePartitions` - PyIceberg: `Transaction.dynamic_partition_overwrite` Credit to @dttung2905 for the original implementation in #482. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
