One approach that’s helped me reason about all this is to treat each base file as its own little mini‑table inside the larger table: the row range of the base file keyed by row_id, and column files/deletes just layer on top.Once a row is deleted in that mini‑table, it stays deleted in that mini‑table’s state (whether that’s via equality deletes, or DVs), and column updates are just layering changed or additional columns on top of whatever rowsare still there. Then I can reason about "what are desirable properties of this mini-table".
Once I look at it that way, stacking equality deletes with column updates on the same column, and then forcing the write path to read all the older column files when producing new column updates, feels like the worst outcome; and it gets worse the more column updates there are for the column. It blows up complexity and performance and compromises the value of efficient column updates. If we eliminate that option, I think we’re left with two high‑level approaches: 1. Equality deletes cannot be allowed with column updates. This simplifies both the read and write paths when column update files are present. I would generally prefer this option but there is a legitimate problem around the “how” for checking for the presence equality deletes. We can’t rely on snapshot summaries, which means we’d have to look at delete manifests to really know if equality deletes exist. There were ideas in the V4 AMT sync about constraining equality deletes to be in the root manifest; in that model, the amount of work needed to check for equality deletes is bounded by the root size. I’d keep that as a separate open question because there are other challenges with requiring equality deletes to only appear in the root manifest, especially on the upgrade path. 2. After an equality delete, subsequent updates must produce a DV. As Xiening highlighted, once you’ve had an equality delete on a column, any subsequent updates on that column would be required to produce a DV (or positional delete) for the deleted positions at the new sequence number, making the original equality delete obsolete. This is attractive because it’s not too constraining for writers: they’re already doing the work of reconciling deleted positions to decide what to write into the column file, so the additional work is basically emitting the DV. The main thing to think through is how exactly the plumbing to engines looks, but in theory it’s just a matter of plumbing through explicitly deleted positions (or, less ideally, inferring them from a sentinel value in the tuple). So far I’m leaning towards option 2, but we should develop some concreteness around how feasible it is for engines to produce the DVs on the column update. Again, should all be theoretically possible based off plumbing deleted positions; we shouldn't let implementations drive the spec but I think sniff testing the practicality of it is well worth it to make sure that restriction is reasonably implementable. Interested in hearing what others think about this one. Thanks, Amogh Jahagirdar
