uthanuja opened a new issue, #16685: URL: https://github.com/apache/iceberg/issues/16685
### Apache Iceberg version None ### Query engine None ### Please describe the bug š Hi Iceberg community, Iām investigating a persistent duplicate-key issue and wanted to check if this matches any known Iceberg behavior. Context: - We ingest parquet data into an Iceberg table using data-load and incremental-load jobs. - Source parquet for the affected ID has a single record (verified). - After ingest, the Iceberg table shows exactly one duplicated ID key (same specific ID). - We deleted that ID from the branch/table and ingested again, but the same duplicate reappeared. Observed behavior: - This is not a broad duplicate problem; it is isolated to one ID key. - Snapshot row counts change across commits, and only this key is duplicated intermittently. - Ingest is expected to be idempotent for this key pattern, but it is not. Runtime/version detail: - Spark: 3.5.x - Iceberg :1.9.2 Questions: Are there known cases where MERGE/UPSERT into Iceberg can produce a persistent single-key duplicate like this? What metadata checks should we run/share first to determine if this is Iceberg-level vs ingest-logic-level? ### Willingness to contribute - [ ] I can contribute a fix for this bug independently - [x] I would be willing to contribute a fix for this bug with guidance from the Iceberg community - [ ] I cannot contribute a fix for this bug at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
