anais-source commented on issue #15305:
URL: https://github.com/apache/iceberg/issues/15305#issuecomment-3896158320

   Thank you @stevenzwu for the clarification about position deletes within the 
same checkpoint. That helped me narrow the investigation.
   
   After further analysis, I believe the root cause may be related to a 
Flink-side configuration issue rather than an Iceberg bug.
   
   I was using `ChangelogMode.upsert() (keyOnlyDeletes = true)` together with 
`table.exec.sink.upsert-materialize = FORCE`.
   
   With this setup, when a DELETE reaches `SinkUpsertMaterializer` for a key 
that is not present in the materializer state (for example, data preloaded by a 
separate bootstrap/batch job), the delete may be dropped. Since only key fields 
are available, the operator cannot safely reconstruct a full upsert view.
   
   If my understanding is correct, this would mean the Iceberg writer did not 
receive those delete events for pre-existing rows, and therefore no 
corresponding equality delete files were produced for those keys.
   
   Fix attempted:
   
   - Switched to `ChangelogMode.upsert(false) (keyOnlyDeletes = false)`, so 
delete rows carry full fields.
   
   - Switched to `table.exec.sink.upsert-materialize = NONE`, removing the 
materializer from this path.
   
   After this change, deletes appear to be correctly propagated and applied by 
all readers I tested (Trino, StarRocks, Flink SQL).
   
   Also, my earlier observation of a data file and an equality delete file 
sharing the same sequence number may reflect a normal upsert write pattern (PK 
delete + row write for updates), rather than evidence that explicit delete 
events were applied incorrectly.
   
   So I’m currently treating this as a Flink changelog/materialization 
configuration issue in my pipeline, not an Iceberg core bug.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to