rdblue commented on issue #360: URL: https://github.com/apache/iceberg/issues/360#issuecomment-653816853
@openinx, I agree with the guarantees that you propose for the reconstructed CDC stream. It sounds like the first solution, with mixed equality and position deletes is probably the design we should use since it will have good read performance and good write performance, with the cost being the ID to position map we need while a data file is open. For `UPSERT`, I think the main difference is that we don't have the deleted column data, so the stream provides a slightly different guarantee when it is replayed. I don't think that we need to keep track of the primary key column in the position delete file. Isn't that table-level configuration that won't change across data files? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
