rdblue edited a comment on issue #360:
URL: https://github.com/apache/iceberg/issues/360#issuecomment-653281824


   I think we're in agreement on a few points for moving forward:
   
   * We will use a static schema for equality deletes
   * We need to be able to reconstruct an equivalent stream of changes for 
streaming CDC pipelines
   * We should add a way to encode all of the columns for an equality delete 
and identify the subset used for deletion (for efficiency)
   * We may need to add a way to encode all columns into position delete files 
(if used for CDC)
   * For the CDC case, we'll first assume that we have the entire deleted row 
in delete events
   * We should handle a stream of upserts as a separate use case
   
   The doc describes ways to use both equality and position deletes for CDC 
streams. Sounds like equality would be ideal if (1) events have a unique ID, 
and (2) the execution has exactly-once semantics. Otherwise, I think it is 
possible to use position deletes. Which do you plan to target?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to