For me it's a question of whether engines should be expected to use some of
these higher order abstractions in the core library, and the extent to
which the core library exposes lower level building blocks for those
engines choosing to go in a different direction.

In Dremio, while we started with a model where we'd read all v2 position
deletes as part of the data file scan operator, we ended up using an
approach for V2 deletes where we'd scan data and delete files separately
and then use a post-scan anti-join between the two to perform the
filtering.  While this performs worse when V2 deletes only reference a
single data file, it was significantly better as the number of data files
referenced by each V2 delete file increased.   With DVs it makes sense to
revert to a model where the data file scan operator is responsible for
reading the DV as well, which is the direction we're taking.   The
challenge is that we would rather not include the cost of having the V2
delete scan and anti-join if there are no V2 deletes present.   We don't
read manifests during query planning proper, so we'd only know this is the
case today once query execution has started.    The approach we might have
to take absent the proposal here is to cache information about the presence
of v2 deletes in our table metadata cache which is used during planning -
though this means we move from a model where we only read metadata.json to
populate this cache, to a model where it has to read manifests as well,
which is a significant increase in cost.   Having some information at the
snapshot level which would indicate the presence of V2 position deletes
solves all the various concerns very nicely.

Regards,
-Scott

On Mon, Jul 28, 2025 at 7:03 AM Anton Okolnychyi <aokolnyc...@gmail.com>
wrote:

> There is no total number of DVs just like there is no total number of
> equality delete files or the total number of position delete files. Those
> types of snapshot metrics simply weren't tracked so we didn't provide an
> equivalent one for DVs when DVs were added. If we feel there is value in
> tracking those metrics now, let's add them?
>
> That said, I am not sure why the physical plan would have to be different
> depending on whether the table has V2 or V3 position deletes. In Spark and
> throughout the core library, both types of position deletes are loaded into
> a Roaring bitmap that has been used in Iceberg as the in-memory
> representation of position deletes basically since the introduction of
> position deletes in V2. If an engine relies on PositionDeleteIndex from the
> core library, it shouldn't matter whether there are V2 deletes, V3 deletes,
> or a mix. The core library hides that complexity from the engine. That's
> why I am not convinced physical plans for V2 and V3 position deletes should
> differ.
>
> - Anton
>
> пн, 28 лип. 2025 р. о 07:58 Jean-Baptiste Onofré <j...@nanthrax.net> пише:
>
>> Hi
>>
>> My understanding of the problem here is during the "transition" period
>> when updating from V2 and V3. The reader/writer can check
>> format-version to see what to expect in terms of DVs (metrics).
>>
>> Regards
>> JB
>>
>> On Mon, Jul 28, 2025 at 6:43 AM Manu Zhang <owenzhang1...@gmail.com>
>> wrote:
>> >
>> > Hi Jordan,
>> >
>> > FYI, Anton explained his rationale of not adding total-dvs in the
>> original PR. [1].
>> > You may also refer to iceberg-java's implementation[2] for scan
>> planning, which looks straight forward to handle both position deletes and
>> deletion vectors.
>> >
>> > I'm curious which language you are building your engine in. I think all
>> implementations need to handle this and you don't need to build your own.
>> >
>> > 1. https://github.com/apache/iceberg/pull/11464/files#r1828388869
>> > 2.
>> https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/DeleteFileIndex.java
>> >
>> > Regards,
>> > Manu
>> >
>> > On Fri, Jul 25, 2025 at 12:13 AM Jordano Mark <jordanom...@gmail.com>
>> wrote:
>> >>
>> >> Hi everyone, below I intend to contextualize an observation I’ve
>> noticed in hopes of discussing with the community.
>> >>
>> >>
>> >> Context:
>> >>
>> >> Some query engines construct scan plans dynamically based on the
>> metrics provided in Iceberg table's metadata.json. For example, when an
>> engine encounters a table with equality deletes, it may rely on the
>> 'total-equality-deletes' metric (as defined in the Iceberg specification
>> here: https://iceberg.apache.org/spec/#metrics) to determine whether
>> equality delete handling logic needs to be engaged during scan planning.
>> >>
>> >> A similar approach is commonly taken for position deletes. Engines may
>> use the 'total-position-deletes' metric to decide whether position deletes
>> need to be accounted for. However, with the introduction of Deletion
>> Vectors (DV) in Iceberg V3, this interpretation of the
>> 'total-position-deletes' field becomes more ambiguous.
>> >>
>> >>
>> >> Problem:
>> >>
>> >> The core issue is this: when total-position-deletes > 0 in a V3 table,
>> it may indicate:
>> >>
>> >> Legacy position delete files (V2) exist
>> >>
>> >> Deletion vectors (V3) exist
>> >>
>> >> Or both
>> >>
>> >> This ambiguity introduces complexity in scan planning. In cases where
>> the physical plan for reading legacy position deletes differs meaningfully
>> from reading deletion vectors, engines must conservatively assume both
>> mechanisms might be in play—even if only one is present. This can lead to
>> unnecessarily complex or suboptimal planning.
>> >>
>> >> I’ve noticed there is an 'added-dvs' metric, but no 'total-dvs'
>> equivalent listed in the Iceberg spec’s Metrics section. As a result,
>> total-position-deletes appears to serve as a catch-all for both V2 and V3
>> position deletes. For engines that rely solely on snapshot-level metrics,
>> this becomes a blind spot. The issue extends beyond the transition period
>> between V2 and V3 too - Even after migrating fully to V3, a table might
>> still retain legacy delete files. Currently, there appears to be no
>> consistent, guaranteed way to prove at the metadata level that only V3
>> deletion vectors are in use. Some inference is possible by walking the
>> snapshot history and aggregating metrics, but this is fragile and
>> case-specific.
>> >>
>> >> It is not viable in to perform manifest scans at runtime to infer
>> delete formats
>> >>
>> >> I’m curious if others in the community have encountered this challenge
>> — and if so, how you’re addressing it. Is there an established pattern to
>> help distinguish V2 vs V3 deletes at the metadata level, without relying on
>> manifest/file-level inspection?
>> >>
>> >>
>> >> Looking forward to hearing your thoughts.
>> >>
>> >> Best,
>> >>
>> >> Jordan
>>
>

Reply via email to