Re: [Discuss] Column Update File Representation

Xiening Dai Tue, 02 Jun 2026 11:44:28 -0700

We also need to think about the DV only case.

If we have f0 with dv0, then we do column update and generate f1. Do we also 
bump the sequence number for f0 in this case? There are multiple options:


1) We bump the sequence number, then we will need to copy dv0 into dv1 and 
assign the same sequence number to dv1 so that the delete positions won't get 
lost.
2) We don't bump the sequence number, then we don't need to re-write dv0 and 
everything would remain working. But this creates a small inconsistency with eq 
delete case, and requires a special case handling at write path.
3) We bump sequence number for both data file f0, and dv0. We don't need to 
rewrite dv, but instead we bump the sequence number for the dv as well.

I'd suggest we write down these details into a spec change proposal and examine 
the read write work flow carefully.

On 2026/06/02 12:42:10 Gábor Kaszab wrote:
> Thanks for the summary, Amogh!
> 
> I think the missing building block to make this eq-delete rewrite work is
> the decision made yesterday, to bump the base file-level sequence number
> when adding a column file. With this, we can make sure that after we have
> rewritten the eq-deletes into DVs in the process of adding column files, we
> don't have to apply the eq-deletes we had previously on the base file.
> 
> Just some thoughts on implementation:
> 
>    - Write path in general: When writing the update file, we designed this
>    in the PoC to receive _path and _pos from the base file. With this we can
>    identify if some positions are missing and we can convert them into DVs
>    - Trailing deletes: The tricky part is when trailing rows are deleted. I
>    see 2 approaches to get around this:
>       - Broadcast base file row counts to writers (this is done by the
>       PoC): When we received the last row from the base file with pos X, but 
> we
>       know there are more rows in the base file, we have to add the trailing
>       positions to the DV
>       - Enrich the input rows fed to the writer with the "_deleted"
>       metadata column. False => write to update file, true => write pos to DV
> 
> Regards,
> Gabor
> 
> Amogh Jahagirdar <[email protected]> ezt írta (időpont: 2026. jún. 1., H,
> 22:48):
> 
> > >The real challenge comes from the read path. In the case when we have a
> > data file f0, an equality delete file d0, and column file f1, and the
> > materialized dv d1. How do we reconcile the deletes during read? If we
> > don't do anything special, following the existing spec (based on sequence
> > number rule), we would apply d0 on f0, and then apply d1 on f1, which
> > should still give us the correct results as both d0 and d1 represent the
> > same set of positions. But this is undesired because we dont want to load
> > and re-evaluate the old column values. So we need a change in the spec so
> > that in this scenario the new d1 supersede the existing equality delete
> > file (d0).
> >
> > So given the following invariants/rules:
> >
> > 1. In a dense representation, column updates must carry over all active
> > values for the column (and there's a _pos column referencing the position
> > from the original base file).
> > 2. Column updates must know what rows were deleted (either to omit the row
> > or materialize the default value)
> > 3. Data sequence numbers are updated on column appends/updates (this would
> > be a spec change in v4). I think reusing the same seq. number is key since
> > we don't have a different sequence number definition that's temporal in
> > dimension for delete matching and another one that's not temporal but for
> > column updates. Having a single sequence number simplifies a lot of this.
> > 4. The requirement that a column update must also rewrite existing
> > equality deletes into DV
> >
> > I think this combination (and the fact that DVs are 1:1 to with data
> > files) naturally addresses this because
> > f1 in this example would have the column values for all the active rows.
> > Then the DV v1 just deletes row positions as usual. There's never a need to
> > actually read the old column values in this model.
> >
> > There's a broader discussion around eliminating new equality deletes in v4
> > but in that case this rule would still apply to handle older equality
> > deletes from v3 and earlier + column updates on older data files as well.
> >
> > We actually talked about this a bit in todays v4 amt sync
> > <https://youtu.be/7mVes-6pM1c?t=861>
> >
> > Thanks,
> > Amogh Jahagirdar
> >
> > On Mon, Jun 1, 2026 at 12:17 PM Xiening Dai <[email protected]> wrote:
> >
> >> > but we should develop some concreteness around how feasible it is for
> >> engines to produce the DVs on the column update.
> >>
> >> Actually I don't think this would be a problem. As mentioned, in order to
> >> generate correct column file, we already need to product the correct set of
> >> deleted positions, and we just need an extra step to materialize these
> >> positions into DV.
> >>
> >> The real challenge comes from the read path. In the case when we have a
> >> data file f0, an equality delete file d0, and column file f1, and the
> >> materialized dv d1. How do we reconcile the deletes during read? If we
> >> don't do anything special, following the existing spec (based on sequence
> >> number rule), we would apply d0 on f0, and then apply d1 on f1, which
> >> should still give us the correct results as both d0 and d1 represent the
> >> same set of positions. But this is undesired because we dont want to load
> >> and re-evaluate the old column values. So we need a change in the spec so
> >> that in this scenario the new d1 supersede the existing equality delete
> >> file (d0).
> >>
> >> On 2026/05/29 23:21:33 Amogh Jahagirdar wrote:
> >> > One approach that’s helped me reason about all this is to treat each
> >> base
> >> > file as its own little mini‑table inside the larger table: the row
> >> range of
> >> > the base file keyed by row_id, and column files/deletes just layer on
> >> top.Once
> >> > a row is deleted in that mini‑table, it stays deleted in that
> >> mini‑table’s
> >> > state (whether that’s via equality deletes, or DVs), and column updates
> >> are
> >> > just layering changed or additional columns on top of whatever rowsare
> >> > still there. Then I can reason about "what are desirable properties of
> >> this
> >> > mini-table".
> >> >
> >> > Once I look at it that way, stacking equality deletes with column
> >> updates
> >> > on the same column, and then forcing the write path to read all the
> >> older
> >> > column files when producing new column updates, feels like the worst
> >> > outcome; and it gets worse the more column updates there are for the
> >> > column. It blows up complexity and performance and compromises the
> >> value of
> >> > efficient column updates.
> >> >
> >> > If we eliminate that option, I think we’re left with two high‑level
> >> > approaches:
> >> >
> >> >    1. Equality deletes cannot be allowed with column updates. This
> >> >    simplifies both the read and write paths when column update files are
> >> >    present. I would generally prefer this option but there is a
> >> legitimate
> >> >    problem around the “how” for checking for the presence equality
> >> deletes. We
> >> >    can’t rely on snapshot summaries, which means we’d have to look at
> >> delete
> >> >    manifests to really know if equality deletes exist. There were ideas
> >> in the
> >> >    V4 AMT sync about constraining equality deletes to be in the root
> >> manifest;
> >> >    in that model, the amount of work needed to check for equality
> >> deletes is
> >> >    bounded by the root size. I’d keep that as a separate open question
> >> because
> >> >    there are other challenges with requiring equality deletes to only
> >> appear
> >> >    in the root manifest, especially on the upgrade path.
> >> >    2. After an equality delete, subsequent updates must produce a DV. As
> >> >    Xiening highlighted, once you’ve had an equality delete on a column,
> >> any
> >> >    subsequent updates on that column would be required to produce a DV
> >> (or
> >> >    positional delete) for the deleted positions at the new sequence
> >> number,
> >> >    making the original equality delete obsolete. This is attractive
> >> because
> >> >    it’s not too constraining for writers: they’re already doing the
> >> work of
> >> >    reconciling deleted positions to decide what to write into the
> >> column file,
> >> >    so the additional work is basically emitting the DV. The main thing
> >> to
> >> >    think through is how exactly the plumbing to engines looks, but in
> >> theory
> >> >    it’s just a matter of plumbing through explicitly deleted positions
> >> (or,
> >> >    less ideally, inferring them from a sentinel value in the tuple).
> >> >
> >> >
> >> > So far I’m leaning towards option 2, but we should develop some
> >> > concreteness around how feasible it is for engines to produce the DVs on
> >> > the column update. Again, should all be theoretically possible based off
> >> > plumbing deleted positions; we shouldn't let implementations drive the
> >> spec
> >> > but I think sniff testing the practicality of it is well worth it to
> >> make
> >> > sure that restriction is reasonably implementable.
> >> >
> >> > Interested in hearing what others think about this one.
> >> >
> >> >
> >> > Thanks,
> >> >
> >> > Amogh Jahagirdar
> >> >
> >>
> >
>

Re: [Discuss] Column Update File Representation

Reply via email to