There is no total number of DVs just like there is no total number of equality delete files or the total number of position delete files. Those types of snapshot metrics simply weren't tracked so we didn't provide an equivalent one for DVs when DVs were added. If we feel there is value in tracking those metrics now, let's add them?
That said, I am not sure why the physical plan would have to be different depending on whether the table has V2 or V3 position deletes. In Spark and throughout the core library, both types of position deletes are loaded into a Roaring bitmap that has been used in Iceberg as the in-memory representation of position deletes basically since the introduction of position deletes in V2. If an engine relies on PositionDeleteIndex from the core library, it shouldn't matter whether there are V2 deletes, V3 deletes, or a mix. The core library hides that complexity from the engine. That's why I am not convinced physical plans for V2 and V3 position deletes should differ. - Anton пн, 28 лип. 2025 р. о 07:58 Jean-Baptiste Onofré <j...@nanthrax.net> пише: > Hi > > My understanding of the problem here is during the "transition" period > when updating from V2 and V3. The reader/writer can check > format-version to see what to expect in terms of DVs (metrics). > > Regards > JB > > On Mon, Jul 28, 2025 at 6:43 AM Manu Zhang <owenzhang1...@gmail.com> > wrote: > > > > Hi Jordan, > > > > FYI, Anton explained his rationale of not adding total-dvs in the > original PR. [1]. > > You may also refer to iceberg-java's implementation[2] for scan > planning, which looks straight forward to handle both position deletes and > deletion vectors. > > > > I'm curious which language you are building your engine in. I think all > implementations need to handle this and you don't need to build your own. > > > > 1. https://github.com/apache/iceberg/pull/11464/files#r1828388869 > > 2. > https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/DeleteFileIndex.java > > > > Regards, > > Manu > > > > On Fri, Jul 25, 2025 at 12:13 AM Jordano Mark <jordanom...@gmail.com> > wrote: > >> > >> Hi everyone, below I intend to contextualize an observation I’ve > noticed in hopes of discussing with the community. > >> > >> > >> Context: > >> > >> Some query engines construct scan plans dynamically based on the > metrics provided in Iceberg table's metadata.json. For example, when an > engine encounters a table with equality deletes, it may rely on the > 'total-equality-deletes' metric (as defined in the Iceberg specification > here: https://iceberg.apache.org/spec/#metrics) to determine whether > equality delete handling logic needs to be engaged during scan planning. > >> > >> A similar approach is commonly taken for position deletes. Engines may > use the 'total-position-deletes' metric to decide whether position deletes > need to be accounted for. However, with the introduction of Deletion > Vectors (DV) in Iceberg V3, this interpretation of the > 'total-position-deletes' field becomes more ambiguous. > >> > >> > >> Problem: > >> > >> The core issue is this: when total-position-deletes > 0 in a V3 table, > it may indicate: > >> > >> Legacy position delete files (V2) exist > >> > >> Deletion vectors (V3) exist > >> > >> Or both > >> > >> This ambiguity introduces complexity in scan planning. In cases where > the physical plan for reading legacy position deletes differs meaningfully > from reading deletion vectors, engines must conservatively assume both > mechanisms might be in play—even if only one is present. This can lead to > unnecessarily complex or suboptimal planning. > >> > >> I’ve noticed there is an 'added-dvs' metric, but no 'total-dvs' > equivalent listed in the Iceberg spec’s Metrics section. As a result, > total-position-deletes appears to serve as a catch-all for both V2 and V3 > position deletes. For engines that rely solely on snapshot-level metrics, > this becomes a blind spot. The issue extends beyond the transition period > between V2 and V3 too - Even after migrating fully to V3, a table might > still retain legacy delete files. Currently, there appears to be no > consistent, guaranteed way to prove at the metadata level that only V3 > deletion vectors are in use. Some inference is possible by walking the > snapshot history and aggregating metrics, but this is fragile and > case-specific. > >> > >> It is not viable in to perform manifest scans at runtime to infer > delete formats > >> > >> I’m curious if others in the community have encountered this challenge > — and if so, how you’re addressing it. Is there an established pattern to > help distinguish V2 vs V3 deletes at the metadata level, without relying on > manifest/file-level inspection? > >> > >> > >> Looking forward to hearing your thoughts. > >> > >> Best, > >> > >> Jordan >