rainerschamm commented on PR #14797:
URL: https://github.com/apache/iceberg/pull/14797#issuecomment-4005314054
> @t3hw @rainerschamm In my testing there are still duplicated records if
oner records are updated frequently. My commit time is 3 minutes. Below are two
records updated within one minute and are only two duplicated records in the
table. There are 434192 records in the table with 434191 distinct id records.
> ## updated_at
>
> 2026-03-05 03:37:58.685000 2026-03-05 03:37:59.076000
Hmm, we have not seen any duplicates yet in our tests but we only tested it
in this setup:
- no partitioning
- merge-on-read
- 5 minute commit
...
iceberg.tables.auto-create-props.write.delete.mode: merge-on-read
iceberg.tables.auto-create-props.write.merge.mode: merge-on-read
iceberg.tables.auto-create-props.write.update.mode: merge-on-read
...
Also we make sure all identifier fields are strictly non-null in the
resulting iceberg table schema.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]