rainerschamm commented on PR #14797:
URL: https://github.com/apache/iceberg/pull/14797#issuecomment-4010638258

   > > > @t3hw @rainerschamm In my testing there are still duplicated records 
if oner records are updated frequently. My commit time is 3 minutes. Below are 
two records updated within one minute and are only two duplicated records in 
the table. There are 434192 records in the table with 434191 distinct id 
records.
   > > > ## updated_at
   > > > 2026-03-05 03:37:58.685000 2026-03-05 03:37:59.076000
   > > 
   > > 
   > > Hmm, we have not seen any duplicates yet in our tests but we only tested 
it in this setup:
   > > 
   > > * no partitioning
   > > * merge-on-read
   > > * 5 minute commit
   > > 
   > > ... iceberg.tables.auto-create-props.write.delete.mode: merge-on-read 
iceberg.tables.auto-create-props.write.merge.mode: merge-on-read 
iceberg.tables.auto-create-props.write.update.mode: merge-on-read ...
   > > Also we make sure all identifier fields are strictly non-null in the 
resulting iceberg table schema.
   > 
   > @rainerschamm Do you mind share the complete sink properties? Let me check 
if anything wrong with my config. It is not always have duplicates and only 
like add a couple of duplicate records everyday. Have no idea how to 
troubleshoot it.
   
   I can't provide the aws s3tables ones since they are for a customer, but 
here are the ones we use for demos.
   
   I hit these tables quite hard with random inserts and updates in postgres 
and they generated no duplicates in iceberg.
   
   
[debezium-source-connector.yaml](https://github.com/user-attachments/files/25790651/debezium-source-connector.yaml)
   
[iceberg-sink-connector.yaml](https://github.com/user-attachments/files/25790652/iceberg-sink-connector.yaml)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to