coolderli commented on issue #3941: URL: https://github.com/apache/iceberg/issues/3941#issuecomment-1064821091
Just share my thought. - Output only delete and insert record types by default Constructing pre/post images may be necessary. If the downstream is an external system that can handle the insert as upsert, the update_before(delete) can be dropped. But if the downstream requires aggregation operation like sum, the update_before can not be dropped. Because in the current implementation, the records have no metadata like create_timestamp, we can't determine the time of deletion and insertion, so maybe we have to delete before inserting. But this is unacceptable, this will cause data to jitter, and users will see the data decrease, and then the data returns to normal. - How to deal with changing identity columns? Maybe we need to consider adding some restrictions like always storing all rows in equality delete files when there is a streaming read. Then maybe we can use the latest primary key. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
