Hi Jordan, FYI, Anton explained his rationale of not adding total-dvs in the original PR. [1]. You may also refer to iceberg-java's implementation[2] for scan planning, which looks straight forward to handle both position deletes and deletion vectors.
I'm curious which language you are building your engine in. I think all implementations need to handle this and you don't need to build your own. 1. https://github.com/apache/iceberg/pull/11464/files#r1828388869 2. https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/DeleteFileIndex.java Regards, Manu On Fri, Jul 25, 2025 at 12:13 AM Jordano Mark <jordanom...@gmail.com> wrote: > Hi everyone, below I intend to contextualize an observation I’ve noticed > in hopes of discussing with the community. > > > *Context:* > > Some query engines construct scan plans dynamically based on the metrics > provided in Iceberg table's metadata.json. For example, when an engine > encounters a table with equality deletes, it may rely on the ' > total-equality-deletes' metric (as defined in the Iceberg specification > here: https://iceberg.apache.org/spec/#metrics) to determine whether > equality delete handling logic needs to be engaged during scan planning. > > A similar approach is commonly taken for position deletes. Engines may use > the 'total-position-deletes' metric to decide whether position deletes > need to be accounted for. However, with the introduction of Deletion > Vectors (DV) in Iceberg V3, this interpretation of the ' > total-position-deletes' field becomes more ambiguous. > > > *Problem:* > > The core issue is this: when total-position-deletes > 0 in a V3 table, it > may indicate: > > > - > > Legacy position delete files (V2) exist > - > > Deletion vectors (V3) exist > - > > Or both > > This ambiguity introduces complexity in scan planning. In cases where the > physical plan for reading legacy position deletes differs meaningfully from > reading deletion vectors, *engines must conservatively assume both > mechanisms might be in play*—even if only one is present. This can lead > to unnecessarily complex or suboptimal planning. > > I’ve noticed there is an 'added-dvs' metric, but no 'total-dvs' equivalent > listed in the Iceberg spec’s Metrics > <https://iceberg.apache.org/spec/#metrics> section. As a result, > total-position-deletes appears to serve as a catch-all for both V2 and V3 > position deletes. For engines that rely solely on snapshot-level metrics, > this becomes a blind spot. The issue extends beyond the transition period > between V2 and V3 too - Even after migrating fully to V3, a table might > still retain legacy delete files. Currently, there appears to be no > consistent, guaranteed way to prove at the metadata level that only V3 > deletion vectors are in use. Some inference is possible by walking the > snapshot history and aggregating metrics, but this is fragile and > case-specific. > > It is not viable in to perform manifest scans at runtime to infer delete > formats > > I’m curious if others in the community have encountered this challenge — > and if so, how you’re addressing it. Is there an established pattern to > help distinguish V2 vs V3 deletes at the metadata level, without relying on > manifest/file-level inspection? > > > Looking forward to hearing your thoughts. > > Best, > > *Jordan* >