> I think the missing building block to make this eq-delete rewrite work is the decision made yesterday, to bump the base file-level sequence number when adding a column file.
That would mean we could have data file that has different sequence number when it's in different snapshots. And it breaks this invariant that's currently in the spec: The sequence_number field represents the data sequence number and must never change after a file is added to the dataset. Not saying this is a no-go, but we need to carefully evaluate the implication of this change. On 2026/06/02 12:42:10 Gábor Kaszab wrote: > Thanks for the summary, Amogh! > > I think the missing building block to make this eq-delete rewrite work is > the decision made yesterday, to bump the base file-level sequence number > when adding a column file. With this, we can make sure that after we have > rewritten the eq-deletes into DVs in the process of adding column files, we > don't have to apply the eq-deletes we had previously on the base file. > > Just some thoughts on implementation: > > - Write path in general: When writing the update file, we designed this > in the PoC to receive _path and _pos from the base file. With this we can > identify if some positions are missing and we can convert them into DVs > - Trailing deletes: The tricky part is when trailing rows are deleted. I > see 2 approaches to get around this: > - Broadcast base file row counts to writers (this is done by the > PoC): When we received the last row from the base file with pos X, but > we > know there are more rows in the base file, we have to add the trailing > positions to the DV > - Enrich the input rows fed to the writer with the "_deleted" > metadata column. False => write to update file, true => write pos to DV > > Regards, > Gabor > > Amogh Jahagirdar <[email protected]> ezt írta (időpont: 2026. jún. 1., H, > 22:48): > > > >The real challenge comes from the read path. In the case when we have a > > data file f0, an equality delete file d0, and column file f1, and the > > materialized dv d1. How do we reconcile the deletes during read? If we > > don't do anything special, following the existing spec (based on sequence > > number rule), we would apply d0 on f0, and then apply d1 on f1, which > > should still give us the correct results as both d0 and d1 represent the > > same set of positions. But this is undesired because we dont want to load > > and re-evaluate the old column values. So we need a change in the spec so > > that in this scenario the new d1 supersede the existing equality delete > > file (d0). > > > > So given the following invariants/rules: > > > > 1. In a dense representation, column updates must carry over all active > > values for the column (and there's a _pos column referencing the position > > from the original base file). > > 2. Column updates must know what rows were deleted (either to omit the row > > or materialize the default value) > > 3. Data sequence numbers are updated on column appends/updates (this would > > be a spec change in v4). I think reusing the same seq. number is key since > > we don't have a different sequence number definition that's temporal in > > dimension for delete matching and another one that's not temporal but for > > column updates. Having a single sequence number simplifies a lot of this. > > 4. The requirement that a column update must also rewrite existing > > equality deletes into DV > > > > I think this combination (and the fact that DVs are 1:1 to with data > > files) naturally addresses this because > > f1 in this example would have the column values for all the active rows. > > Then the DV v1 just deletes row positions as usual. There's never a need to > > actually read the old column values in this model. > > > > There's a broader discussion around eliminating new equality deletes in v4 > > but in that case this rule would still apply to handle older equality > > deletes from v3 and earlier + column updates on older data files as well. > > > > We actually talked about this a bit in todays v4 amt sync > > <https://youtu.be/7mVes-6pM1c?t=861> > > > > Thanks, > > Amogh Jahagirdar > > > > On Mon, Jun 1, 2026 at 12:17 PM Xiening Dai <[email protected]> wrote: > > > >> > but we should develop some concreteness around how feasible it is for > >> engines to produce the DVs on the column update. > >> > >> Actually I don't think this would be a problem. As mentioned, in order to > >> generate correct column file, we already need to product the correct set of > >> deleted positions, and we just need an extra step to materialize these > >> positions into DV. > >> > >> The real challenge comes from the read path. In the case when we have a > >> data file f0, an equality delete file d0, and column file f1, and the > >> materialized dv d1. How do we reconcile the deletes during read? If we > >> don't do anything special, following the existing spec (based on sequence > >> number rule), we would apply d0 on f0, and then apply d1 on f1, which > >> should still give us the correct results as both d0 and d1 represent the > >> same set of positions. But this is undesired because we dont want to load > >> and re-evaluate the old column values. So we need a change in the spec so > >> that in this scenario the new d1 supersede the existing equality delete > >> file (d0). > >> > >> On 2026/05/29 23:21:33 Amogh Jahagirdar wrote: > >> > One approach that’s helped me reason about all this is to treat each > >> base > >> > file as its own little mini‑table inside the larger table: the row > >> range of > >> > the base file keyed by row_id, and column files/deletes just layer on > >> top.Once > >> > a row is deleted in that mini‑table, it stays deleted in that > >> mini‑table’s > >> > state (whether that’s via equality deletes, or DVs), and column updates > >> are > >> > just layering changed or additional columns on top of whatever rowsare > >> > still there. Then I can reason about "what are desirable properties of > >> this > >> > mini-table". > >> > > >> > Once I look at it that way, stacking equality deletes with column > >> updates > >> > on the same column, and then forcing the write path to read all the > >> older > >> > column files when producing new column updates, feels like the worst > >> > outcome; and it gets worse the more column updates there are for the > >> > column. It blows up complexity and performance and compromises the > >> value of > >> > efficient column updates. > >> > > >> > If we eliminate that option, I think we’re left with two high‑level > >> > approaches: > >> > > >> > 1. Equality deletes cannot be allowed with column updates. This > >> > simplifies both the read and write paths when column update files are > >> > present. I would generally prefer this option but there is a > >> legitimate > >> > problem around the “how” for checking for the presence equality > >> deletes. We > >> > can’t rely on snapshot summaries, which means we’d have to look at > >> delete > >> > manifests to really know if equality deletes exist. There were ideas > >> in the > >> > V4 AMT sync about constraining equality deletes to be in the root > >> manifest; > >> > in that model, the amount of work needed to check for equality > >> deletes is > >> > bounded by the root size. I’d keep that as a separate open question > >> because > >> > there are other challenges with requiring equality deletes to only > >> appear > >> > in the root manifest, especially on the upgrade path. > >> > 2. After an equality delete, subsequent updates must produce a DV. As > >> > Xiening highlighted, once you’ve had an equality delete on a column, > >> any > >> > subsequent updates on that column would be required to produce a DV > >> (or > >> > positional delete) for the deleted positions at the new sequence > >> number, > >> > making the original equality delete obsolete. This is attractive > >> because > >> > it’s not too constraining for writers: they’re already doing the > >> work of > >> > reconciling deleted positions to decide what to write into the > >> column file, > >> > so the additional work is basically emitting the DV. The main thing > >> to > >> > think through is how exactly the plumbing to engines looks, but in > >> theory > >> > it’s just a matter of plumbing through explicitly deleted positions > >> (or, > >> > less ideally, inferring them from a sentinel value in the tuple). > >> > > >> > > >> > So far I’m leaning towards option 2, but we should develop some > >> > concreteness around how feasible it is for engines to produce the DVs on > >> > the column update. Again, should all be theoretically possible based off > >> > plumbing deleted positions; we shouldn't let implementations drive the > >> spec > >> > but I think sniff testing the practicality of it is well worth it to > >> make > >> > sure that restriction is reasonably implementable. > >> > > >> > Interested in hearing what others think about this one. > >> > > >> > > >> > Thanks, > >> > > >> > Amogh Jahagirdar > >> > > >> > > >
