laskoviymishka commented on issue #602: URL: https://github.com/apache/iceberg-go/issues/602#issuecomment-4063879892
I've been looking into this for a while and I think it's more feasible than it might seem — iceberg-go already has most of the plumbing in place. The `snapshotProducer` supports data files + position delete files in one snapshot, `DataFileBuilder.EqualityFieldIDs()` exists, and the metrics infrastructure already handles equality deletes. My thinking is to approach this incrementally: 1. Start with the RowDelta API surface itself — a builder on `Transaction` that commits data files + delete files in one snapshot. This would work immediately with position deletes that are already supported end-to-end. 2. Then add equality delete file writing (the writer, schema projection, wiring into the snapshot producer). 3. Then equality delete reading in the scanner (the hardest part — hash-based anti-join + sequence number filtering). I have a concrete use case driving this — CDC replication from Postgres to Iceberg via [Transferia](https://github.com/transferia/iceberg) ([transferia/iceberg#4](https://github.com/transferia/iceberg/issues/4)). I'm working on a related multi-table commit feature (#784) right now and plan to pick this up next. Happy to share a more detailed breakdown or discuss API design before starting. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
