rainerschamm commented on PR #14797: URL: https://github.com/apache/iceberg/pull/14797#issuecomment-4010638258
> > > @t3hw @rainerschamm In my testing there are still duplicated records if oner records are updated frequently. My commit time is 3 minutes. Below are two records updated within one minute and are only two duplicated records in the table. There are 434192 records in the table with 434191 distinct id records. > > > ## updated_at > > > 2026-03-05 03:37:58.685000 2026-03-05 03:37:59.076000 > > > > > > Hmm, we have not seen any duplicates yet in our tests but we only tested it in this setup: > > > > * no partitioning > > * merge-on-read > > * 5 minute commit > > > > ... iceberg.tables.auto-create-props.write.delete.mode: merge-on-read iceberg.tables.auto-create-props.write.merge.mode: merge-on-read iceberg.tables.auto-create-props.write.update.mode: merge-on-read ... > > Also we make sure all identifier fields are strictly non-null in the resulting iceberg table schema. > > @rainerschamm Do you mind share the complete sink properties? Let me check if anything wrong with my config. It is not always have duplicates and only like add a couple of duplicate records everyday. Have no idea how to troubleshoot it. I can't provide the aws s3tables ones since they are for a customer, but here are the ones we use for demos. I hit these tables quite hard with random inserts and updates in postgres and they generated no duplicates in iceberg. [debezium-source-connector.yaml](https://github.com/user-attachments/files/25790651/debezium-source-connector.yaml) [iceberg-sink-connector.yaml](https://github.com/user-attachments/files/25790652/iceberg-sink-connector.yaml) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
