One approach that’s helped me reason about all this is to treat each base
file as its own little mini‑table inside the larger table: the row range of
the base file keyed by row_id, and column files/deletes just layer on top.Once
a row is deleted in that mini‑table, it stays deleted in that mini‑table’s
state (whether that’s via equality deletes, or DVs), and column updates are
just layering changed or additional columns on top of whatever rowsare
still there. Then I can reason about "what are desirable properties of this
mini-table".

Once I look at it that way, stacking equality deletes with column updates
on the same column, and then forcing the write path to read all the older
column files when producing new column updates, feels like the worst
outcome; and it gets worse the more column updates there are for the
column. It blows up complexity and performance and compromises the value of
efficient column updates.

If we eliminate that option, I think we’re left with two high‑level
approaches:

   1. Equality deletes cannot be allowed with column updates. This
   simplifies both the read and write paths when column update files are
   present. I would generally prefer this option but there is a legitimate
   problem around the “how” for checking for the presence equality deletes. We
   can’t rely on snapshot summaries, which means we’d have to look at delete
   manifests to really know if equality deletes exist. There were ideas in the
   V4 AMT sync about constraining equality deletes to be in the root manifest;
   in that model, the amount of work needed to check for equality deletes is
   bounded by the root size. I’d keep that as a separate open question because
   there are other challenges with requiring equality deletes to only appear
   in the root manifest, especially on the upgrade path.
   2. After an equality delete, subsequent updates must produce a DV. As
   Xiening highlighted, once you’ve had an equality delete on a column, any
   subsequent updates on that column would be required to produce a DV (or
   positional delete) for the deleted positions at the new sequence number,
   making the original equality delete obsolete. This is attractive because
   it’s not too constraining for writers: they’re already doing the work of
   reconciling deleted positions to decide what to write into the column file,
   so the additional work is basically emitting the DV. The main thing to
   think through is how exactly the plumbing to engines looks, but in theory
   it’s just a matter of plumbing through explicitly deleted positions (or,
   less ideally, inferring them from a sentinel value in the tuple).


So far I’m leaning towards option 2, but we should develop some
concreteness around how feasible it is for engines to produce the DVs on
the column update. Again, should all be theoretically possible based off
plumbing deleted positions; we shouldn't let implementations drive the spec
but I think sniff testing the practicality of it is well worth it to make
sure that restriction is reasonably implementable.

Interested in hearing what others think about this one.


Thanks,

Amogh Jahagirdar

Reply via email to