anais-source opened a new issue, #15305:
URL: https://github.com/apache/iceberg/issues/15305

   ### Apache Iceberg version
   
   1.10.1 (latest release)
   
   ### Query engine
   
   Flink
   
   ### Please describe the bug 🐞
   
   ### Description
   We observed a reproducible issue in a Flink + Iceberg upsert pipeline where 
equality deletes are written, but rows remain visible to readers (Flink SQL, 
Trino, StarRocks).
   
   After investigation, for the same key/partition we found in metadata:
   
   one content=0 data file
   one content=2 equality delete file
   both with the same sequence_number in the same snapshot/commit
   Because equality deletes apply to rows with lower sequence numbers (not 
equal), the delete does not remove the co-committed data row.
   
   So this is not a key mismatch issue. It is a write semantics issue: data + 
delete for same key can end up at same sequence level.
   
   ### Environment
   
   Iceberg: 1.10.1
   Flink: 2.2.0
   Catalog: REST (Gravitino)
   Table format: v2
   Table write mode: upsert + merge-on-read
   Readers tested: Flink SQL, Trino, StarRocks
   
   ### Expected behavior
   If a DELETE is emitted for a key, final visible state should reflect 
deletion (unless a later insert/update exists).
   
   ### Minimal evidence query
   
   `SELECT
     snapshot_id,
     status,
     sequence_number,
     data_file.content,
     data_file.file_path
   FROM "versioned_profile_labels$all_entries"
   WHERE data_file.file_path LIKE '%account_id=<ACCOUNT_ID>%'
   ORDER BY sequence_number DESC;`
   
   ### Result example
   
   - same snapshot_id
   - same sequence_number
   - both content=0 and content=2
   
   W### orkarounds tested
   
   - table.exec.sink.upsert-materialize=NONE (helps reduce side effects but 
issue still possible)
   - disabling maintenance (no change for this symptom)
   - controlled test table can work, but production stream reproduces issue
   
   ### Question
   Is this expected semantics for Flink sink row-delta commits in upsert mode, 
or should sink/committer ensure delete/data ordering so equality deletes can 
apply for same-key mutations in the same cycle?
   
   ### Willingness to contribute
   
   - [ ] I can contribute a fix for this bug independently
   - [ ] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to