[I] Unexpected duplicate ID after ingest: input parquet has single ID, but Iceberg table keeps one duplicated key [iceberg]

via GitHub Thu, 04 Jun 2026 21:46:02 -0700


uthanuja opened a new issue, #16685:
URL: https://github.com/apache/iceberg/issues/16685


   ### Apache Iceberg version
   
   None
   
   ### Query engine
   
   None
   
   ### Please describe the bug 🐞
   
   Hi Iceberg community,
   
   I’m investigating a persistent duplicate-key issue and wanted to check if 
this matches any known Iceberg behavior.
   
   Context:
   - We ingest parquet data into an Iceberg table using data-load and 
incremental-load jobs.
   - Source parquet for the affected ID has a single record (verified).
   - After ingest, the Iceberg table shows exactly one duplicated ID key (same 
specific ID).
   - We deleted that ID from the branch/table and ingested again, but the same 
duplicate reappeared.
   
   Observed behavior:
   - This is not a broad duplicate problem; it is isolated to one ID key.
   - Snapshot row counts change across commits, and only this key is duplicated 
intermittently.
   - Ingest is expected to be idempotent for this key pattern, but it is not.
   
   Runtime/version detail:
   - Spark: 3.5.x
   - Iceberg :1.9.2
   
   Questions:
   
   Are there known cases where MERGE/UPSERT into Iceberg can produce a 
persistent single-key duplicate like this?
   What metadata checks should we run/share first to determine if this is 
Iceberg-level vs ingest-logic-level?
   
   
   ### Willingness to contribute
   
   - [ ] I can contribute a fix for this bug independently
   - [x] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Unexpected duplicate ID after ingest: input parquet has single ID, but Iceberg table keeps one duplicated key [iceberg]

Reply via email to