coolderli commented on issue #3941:
URL: https://github.com/apache/iceberg/issues/3941#issuecomment-1064821091


   Just share my thought.
   
   - Output only delete and insert record types by default
   Constructing pre/post images may be necessary. If the downstream is an 
external system that can handle the insert as upsert, the update_before(delete) 
can be dropped. But if the downstream requires aggregation operation like sum, 
the update_before can not be dropped.
   Because in the current implementation, the records have no metadata like 
create_timestamp, we can't determine the time of deletion and insertion, so 
maybe we have to delete before inserting. But this is unacceptable, this will 
cause data to jitter, and users will see the data decrease, and then the data 
returns to normal.
   
   - How to deal with changing identity columns?
   Maybe we need to consider adding some restrictions like always storing all 
rows in equality delete files when there is a streaming read. Then maybe we can 
use the latest primary key.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to