Re: [Discuss] Column Update File Representation

Xiening Dai Tue, 02 Jun 2026 10:31:52 -0700

> I think the missing building block to make this eq-delete rewrite work is
the decision made yesterday, to bump the base file-level sequence number
when adding a column file.


That would mean we could have data file that has different sequence number when 
it's in different snapshots. And it breaks this invariant that's currently in 
the spec:

The sequence_number field represents the data sequence number and must never 
change after a file is added to the dataset.

Not saying this is a no-go, but we need to carefully evaluate the implication 
of this change.


On 2026/06/02 12:42:10 Gábor Kaszab wrote:
> Thanks for the summary, Amogh!
> 
> I think the missing building block to make this eq-delete rewrite work is
> the decision made yesterday, to bump the base file-level sequence number
> when adding a column file. With this, we can make sure that after we have
> rewritten the eq-deletes into DVs in the process of adding column files, we
> don't have to apply the eq-deletes we had previously on the base file.
> 
> Just some thoughts on implementation:
> 
>    - Write path in general: When writing the update file, we designed this
>    in the PoC to receive _path and _pos from the base file. With this we can
>    identify if some positions are missing and we can convert them into DVs
>    - Trailing deletes: The tricky part is when trailing rows are deleted. I
>    see 2 approaches to get around this:
>       - Broadcast base file row counts to writers (this is done by the
>       PoC): When we received the last row from the base file with pos X, but 
> we
>       know there are more rows in the base file, we have to add the trailing
>       positions to the DV
>       - Enrich the input rows fed to the writer with the "_deleted"
>       metadata column. False => write to update file, true => write pos to DV
> 
> Regards,
> Gabor
> 
> Amogh Jahagirdar <[email protected]> ezt írta (időpont: 2026. jún. 1., H,
> 22:48):
> 
> > >The real challenge comes from the read path. In the case when we have a
> > data file f0, an equality delete file d0, and column file f1, and the
> > materialized dv d1. How do we reconcile the deletes during read? If we
> > don't do anything special, following the existing spec (based on sequence
> > number rule), we would apply d0 on f0, and then apply d1 on f1, which
> > should still give us the correct results as both d0 and d1 represent the
> > same set of positions. But this is undesired because we dont want to load
> > and re-evaluate the old column values. So we need a change in the spec so
> > that in this scenario the new d1 supersede the existing equality delete
> > file (d0).
> >
> > So given the following invariants/rules:
> >
> > 1. In a dense representation, column updates must carry over all active
> > values for the column (and there's a _pos column referencing the position
> > from the original base file).
> > 2. Column updates must know what rows were deleted (either to omit the row
> > or materialize the default value)
> > 3. Data sequence numbers are updated on column appends/updates (this would
> > be a spec change in v4). I think reusing the same seq. number is key since
> > we don't have a different sequence number definition that's temporal in
> > dimension for delete matching and another one that's not temporal but for
> > column updates. Having a single sequence number simplifies a lot of this.
> > 4. The requirement that a column update must also rewrite existing
> > equality deletes into DV
> >
> > I think this combination (and the fact that DVs are 1:1 to with data
> > files) naturally addresses this because
> > f1 in this example would have the column values for all the active rows.
> > Then the DV v1 just deletes row positions as usual. There's never a need to
> > actually read the old column values in this model.
> >
> > There's a broader discussion around eliminating new equality deletes in v4
> > but in that case this rule would still apply to handle older equality
> > deletes from v3 and earlier + column updates on older data files as well.
> >
> > We actually talked about this a bit in todays v4 amt sync
> > <https://youtu.be/7mVes-6pM1c?t=861>
> >
> > Thanks,
> > Amogh Jahagirdar
> >
> > On Mon, Jun 1, 2026 at 12:17 PM Xiening Dai <[email protected]> wrote:
> >
> >> > but we should develop some concreteness around how feasible it is for
> >> engines to produce the DVs on the column update.
> >>
> >> Actually I don't think this would be a problem. As mentioned, in order to
> >> generate correct column file, we already need to product the correct set of
> >> deleted positions, and we just need an extra step to materialize these
> >> positions into DV.
> >>
> >> The real challenge comes from the read path. In the case when we have a
> >> data file f0, an equality delete file d0, and column file f1, and the
> >> materialized dv d1. How do we reconcile the deletes during read? If we
> >> don't do anything special, following the existing spec (based on sequence
> >> number rule), we would apply d0 on f0, and then apply d1 on f1, which
> >> should still give us the correct results as both d0 and d1 represent the
> >> same set of positions. But this is undesired because we dont want to load
> >> and re-evaluate the old column values. So we need a change in the spec so
> >> that in this scenario the new d1 supersede the existing equality delete
> >> file (d0).
> >>
> >> On 2026/05/29 23:21:33 Amogh Jahagirdar wrote:
> >> > One approach that’s helped me reason about all this is to treat each
> >> base
> >> > file as its own little mini‑table inside the larger table: the row
> >> range of
> >> > the base file keyed by row_id, and column files/deletes just layer on
> >> top.Once
> >> > a row is deleted in that mini‑table, it stays deleted in that
> >> mini‑table’s
> >> > state (whether that’s via equality deletes, or DVs), and column updates
> >> are
> >> > just layering changed or additional columns on top of whatever rowsare
> >> > still there. Then I can reason about "what are desirable properties of
> >> this
> >> > mini-table".
> >> >
> >> > Once I look at it that way, stacking equality deletes with column
> >> updates
> >> > on the same column, and then forcing the write path to read all the
> >> older
> >> > column files when producing new column updates, feels like the worst
> >> > outcome; and it gets worse the more column updates there are for the
> >> > column. It blows up complexity and performance and compromises the
> >> value of
> >> > efficient column updates.
> >> >
> >> > If we eliminate that option, I think we’re left with two high‑level
> >> > approaches:
> >> >
> >> >    1. Equality deletes cannot be allowed with column updates. This
> >> >    simplifies both the read and write paths when column update files are
> >> >    present. I would generally prefer this option but there is a
> >> legitimate
> >> >    problem around the “how” for checking for the presence equality
> >> deletes. We
> >> >    can’t rely on snapshot summaries, which means we’d have to look at
> >> delete
> >> >    manifests to really know if equality deletes exist. There were ideas
> >> in the
> >> >    V4 AMT sync about constraining equality deletes to be in the root
> >> manifest;
> >> >    in that model, the amount of work needed to check for equality
> >> deletes is
> >> >    bounded by the root size. I’d keep that as a separate open question
> >> because
> >> >    there are other challenges with requiring equality deletes to only
> >> appear
> >> >    in the root manifest, especially on the upgrade path.
> >> >    2. After an equality delete, subsequent updates must produce a DV. As
> >> >    Xiening highlighted, once you’ve had an equality delete on a column,
> >> any
> >> >    subsequent updates on that column would be required to produce a DV
> >> (or
> >> >    positional delete) for the deleted positions at the new sequence
> >> number,
> >> >    making the original equality delete obsolete. This is attractive
> >> because
> >> >    it’s not too constraining for writers: they’re already doing the
> >> work of
> >> >    reconciling deleted positions to decide what to write into the
> >> column file,
> >> >    so the additional work is basically emitting the DV. The main thing
> >> to
> >> >    think through is how exactly the plumbing to engines looks, but in
> >> theory
> >> >    it’s just a matter of plumbing through explicitly deleted positions
> >> (or,
> >> >    less ideally, inferring them from a sentinel value in the tuple).
> >> >
> >> >
> >> > So far I’m leaning towards option 2, but we should develop some
> >> > concreteness around how feasible it is for engines to produce the DVs on
> >> > the column update. Again, should all be theoretically possible based off
> >> > plumbing deleted positions; we shouldn't let implementations drive the
> >> spec
> >> > but I think sniff testing the practicality of it is well worth it to
> >> make
> >> > sure that restriction is reasonably implementable.
> >> >
> >> > Interested in hearing what others think about this one.
> >> >
> >> >
> >> > Thanks,
> >> >
> >> > Amogh Jahagirdar
> >> >
> >>
> >
>

Re: [Discuss] Column Update File Representation

Reply via email to