Re: [I] Best practices for minimizing rewrites with `DELETE`s? [iceberg-go]

via GitHub Wed, 13 May 2026 13:58:38 -0700


laskoviymishka commented on issue #1077:
URL: https://github.com/apache/iceberg-go/issues/1077#issuecomment-4445159884


   TL;DR: Yes and Yes, *with some nuaces*
   
   Q1 — yes, with one small clarification.
   
   `tbl.Delete` is the high-level delete API. With 
`write.delete.mode=merge-on-read`, it writes **position-delete files**. With 
copy-on-write, which is the default, it rewrites data files.
   
   Equality deletes do **not** come from `Delete`. Those require calling the 
separate lower-level `WriteEqualityDeletes` API explicitly, usually for cases 
like CDC where you already have keys.
   
   So your framing is right for `Delete`: the call site doesn’t expose 
“position vs equality”; in MoR it just writes position deletes. But equality 
deletes are a separate path, not another mode of `Delete`.
   
   Q2 — yes, exactly.
   
   Position-delete files are just row-level “`file_path + position` was 
deleted” entries. They don’t help the planner skip files for `subject = 'foo'`.
   
   A scan like `subject = 'foo'` still needs to:
   
   1. read every data file that wasn’t pruned,
   2. evaluate `subject = 'foo'`,
   3. then remove rows that are present in position-delete files.
   
   Predicate-level skipping comes from separate stats-based mechanisms:
   
   * manifest column bounds / file pruning
   * Parquet row-group statistics
   
   Those don’t know that you “deleted all foo rows.” They only look at min/max 
stats. So if `foo` is spread across files whose `subject` bounds still include 
`foo`, those files still get touched. The delete reduces the final row count, 
but not necessarily the planner’s file or row-group set.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Best practices for minimizing rewrites with `DELETE`s? [iceberg-go]

Reply via email to