rdblue commented on issue #360:
URL: https://github.com/apache/iceberg/issues/360#issuecomment-653816853


   @openinx, I agree with the guarantees that you propose for the reconstructed 
CDC stream. It sounds like the first solution, with mixed equality and position 
deletes is probably the design we should use since it will have good read 
performance and good write performance, with the cost being the ID to position 
map we need while a data file is open.
   
   For `UPSERT`, I think the main difference is that we don't have the deleted 
column data, so the stream provides a slightly different guarantee when it is 
replayed. I don't think that we need to keep track of the primary key column in 
the position delete file. Isn't that table-level configuration that won't 
change across data files?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to