I still have concerns with this decision: > Implementation Details: Specific writer implementation details such as choosing between dense or sparse representations will be left to individual engines. > Specification Scope: The specification will not mandate these internal implementation choices, provided that engines adhere to writing the explicit *_pos* column.
If we do not specify whether the representation should be dense or sparse, we are effectively requiring all engines to support the sparse representation, since the dense representation is just a special case of the sparse one. In practice, this means every implementation must be able to materialize a dense representation from the sparse form, similar to what the current Spark implementation does today. While this is certainly feasible, it introduces an additional step on the read path, which is often performance-sensitive. This concern has been raised consistently by representatives of other Iceberg implementations, and I have not heard a different perspective from them so far. That said, if the broader group is comfortable accepting this trade-off, I do not have any further objections to the proposal. Thanks, Peter Anurag Mantripragada <[email protected]> ezt írta (időpont: 2026. jún. 16., K, 20:51): > Hi all, > > It seems this thread has become conflated with the metadata representation > discussion > <https://lists.apache.org/thread/7jryw9dfvc02s411twn4o7s5gjrybfxg>. While > all the points raised here are noted, let’s continue those specific parts > of the conversation in the metadata thread. > > Regarding data representation, we discussed the following during this > <https://www.youtube.com/watch?v=kuxFBm-j5hw&t=3s> sync: > > - Implementation Details: Specific writer implementation details such > as choosing between dense or sparse representations will be left to > individual engines. > - Specification Scope: The specification will not mandate these > internal implementation choices, provided that engines adhere to writing > the explicit *_pos* column. > > Please let me know if you have concerns. > > ~ Anurag > > > On Tue, Jun 2, 2026 at 11:44 AM Xiening Dai <[email protected]> wrote: > >> We also need to think about the DV only case. >> >> If we have f0 with dv0, then we do column update and generate f1. Do we >> also bump the sequence number for f0 in this case? There are multiple >> options: >> >> 1) We bump the sequence number, then we will need to copy dv0 into dv1 >> and assign the same sequence number to dv1 so that the delete positions >> won't get lost. >> 2) We don't bump the sequence number, then we don't need to re-write dv0 >> and everything would remain working. But this creates a small inconsistency >> with eq delete case, and requires a special case handling at write path. >> 3) We bump sequence number for both data file f0, and dv0. We don't need >> to rewrite dv, but instead we bump the sequence number for the dv as well. >> >> I'd suggest we write down these details into a spec change proposal and >> examine the read write work flow carefully. >> >> On 2026/06/02 12:42:10 Gábor Kaszab wrote: >> > Thanks for the summary, Amogh! >> > >> > I think the missing building block to make this eq-delete rewrite work >> is >> > the decision made yesterday, to bump the base file-level sequence number >> > when adding a column file. With this, we can make sure that after we >> have >> > rewritten the eq-deletes into DVs in the process of adding column >> files, we >> > don't have to apply the eq-deletes we had previously on the base file. >> > >> > Just some thoughts on implementation: >> > >> > - Write path in general: When writing the update file, we designed >> this >> > in the PoC to receive _path and _pos from the base file. With this >> we can >> > identify if some positions are missing and we can convert them into >> DVs >> > - Trailing deletes: The tricky part is when trailing rows are >> deleted. I >> > see 2 approaches to get around this: >> > - Broadcast base file row counts to writers (this is done by the >> > PoC): When we received the last row from the base file with pos >> X, but we >> > know there are more rows in the base file, we have to add the >> trailing >> > positions to the DV >> > - Enrich the input rows fed to the writer with the "_deleted" >> > metadata column. False => write to update file, true => write pos >> to DV >> > >> > Regards, >> > Gabor >> > >> > Amogh Jahagirdar <[email protected]> ezt írta (időpont: 2026. jún. 1., >> H, >> > 22:48): >> > >> > > >The real challenge comes from the read path. In the case when we >> have a >> > > data file f0, an equality delete file d0, and column file f1, and the >> > > materialized dv d1. How do we reconcile the deletes during read? If we >> > > don't do anything special, following the existing spec (based on >> sequence >> > > number rule), we would apply d0 on f0, and then apply d1 on f1, which >> > > should still give us the correct results as both d0 and d1 represent >> the >> > > same set of positions. But this is undesired because we dont want to >> load >> > > and re-evaluate the old column values. So we need a change in the >> spec so >> > > that in this scenario the new d1 supersede the existing equality >> delete >> > > file (d0). >> > > >> > > So given the following invariants/rules: >> > > >> > > 1. In a dense representation, column updates must carry over all >> active >> > > values for the column (and there's a _pos column referencing the >> position >> > > from the original base file). >> > > 2. Column updates must know what rows were deleted (either to omit >> the row >> > > or materialize the default value) >> > > 3. Data sequence numbers are updated on column appends/updates (this >> would >> > > be a spec change in v4). I think reusing the same seq. number is key >> since >> > > we don't have a different sequence number definition that's temporal >> in >> > > dimension for delete matching and another one that's not temporal but >> for >> > > column updates. Having a single sequence number simplifies a lot of >> this. >> > > 4. The requirement that a column update must also rewrite existing >> > > equality deletes into DV >> > > >> > > I think this combination (and the fact that DVs are 1:1 to with data >> > > files) naturally addresses this because >> > > f1 in this example would have the column values for all the active >> rows. >> > > Then the DV v1 just deletes row positions as usual. There's never a >> need to >> > > actually read the old column values in this model. >> > > >> > > There's a broader discussion around eliminating new equality deletes >> in v4 >> > > but in that case this rule would still apply to handle older equality >> > > deletes from v3 and earlier + column updates on older data files as >> well. >> > > >> > > We actually talked about this a bit in todays v4 amt sync >> > > <https://youtu.be/7mVes-6pM1c?t=861> >> > > >> > > Thanks, >> > > Amogh Jahagirdar >> > > >> > > On Mon, Jun 1, 2026 at 12:17 PM Xiening Dai <[email protected]> wrote: >> > > >> > >> > but we should develop some concreteness around how feasible it is >> for >> > >> engines to produce the DVs on the column update. >> > >> >> > >> Actually I don't think this would be a problem. As mentioned, in >> order to >> > >> generate correct column file, we already need to product the correct >> set of >> > >> deleted positions, and we just need an extra step to materialize >> these >> > >> positions into DV. >> > >> >> > >> The real challenge comes from the read path. In the case when we >> have a >> > >> data file f0, an equality delete file d0, and column file f1, and the >> > >> materialized dv d1. How do we reconcile the deletes during read? If >> we >> > >> don't do anything special, following the existing spec (based on >> sequence >> > >> number rule), we would apply d0 on f0, and then apply d1 on f1, which >> > >> should still give us the correct results as both d0 and d1 represent >> the >> > >> same set of positions. But this is undesired because we dont want to >> load >> > >> and re-evaluate the old column values. So we need a change in the >> spec so >> > >> that in this scenario the new d1 supersede the existing equality >> delete >> > >> file (d0). >> > >> >> > >> On 2026/05/29 23:21:33 Amogh Jahagirdar wrote: >> > >> > One approach that’s helped me reason about all this is to treat >> each >> > >> base >> > >> > file as its own little mini‑table inside the larger table: the row >> > >> range of >> > >> > the base file keyed by row_id, and column files/deletes just layer >> on >> > >> top.Once >> > >> > a row is deleted in that mini‑table, it stays deleted in that >> > >> mini‑table’s >> > >> > state (whether that’s via equality deletes, or DVs), and column >> updates >> > >> are >> > >> > just layering changed or additional columns on top of whatever >> rowsare >> > >> > still there. Then I can reason about "what are desirable >> properties of >> > >> this >> > >> > mini-table". >> > >> > >> > >> > Once I look at it that way, stacking equality deletes with column >> > >> updates >> > >> > on the same column, and then forcing the write path to read all the >> > >> older >> > >> > column files when producing new column updates, feels like the >> worst >> > >> > outcome; and it gets worse the more column updates there are for >> the >> > >> > column. It blows up complexity and performance and compromises the >> > >> value of >> > >> > efficient column updates. >> > >> > >> > >> > If we eliminate that option, I think we’re left with two high‑level >> > >> > approaches: >> > >> > >> > >> > 1. Equality deletes cannot be allowed with column updates. This >> > >> > simplifies both the read and write paths when column update >> files are >> > >> > present. I would generally prefer this option but there is a >> > >> legitimate >> > >> > problem around the “how” for checking for the presence equality >> > >> deletes. We >> > >> > can’t rely on snapshot summaries, which means we’d have to look >> at >> > >> delete >> > >> > manifests to really know if equality deletes exist. There were >> ideas >> > >> in the >> > >> > V4 AMT sync about constraining equality deletes to be in the >> root >> > >> manifest; >> > >> > in that model, the amount of work needed to check for equality >> > >> deletes is >> > >> > bounded by the root size. I’d keep that as a separate open >> question >> > >> because >> > >> > there are other challenges with requiring equality deletes to >> only >> > >> appear >> > >> > in the root manifest, especially on the upgrade path. >> > >> > 2. After an equality delete, subsequent updates must produce a >> DV. As >> > >> > Xiening highlighted, once you’ve had an equality delete on a >> column, >> > >> any >> > >> > subsequent updates on that column would be required to produce >> a DV >> > >> (or >> > >> > positional delete) for the deleted positions at the new sequence >> > >> number, >> > >> > making the original equality delete obsolete. This is attractive >> > >> because >> > >> > it’s not too constraining for writers: they’re already doing the >> > >> work of >> > >> > reconciling deleted positions to decide what to write into the >> > >> column file, >> > >> > so the additional work is basically emitting the DV. The main >> thing >> > >> to >> > >> > think through is how exactly the plumbing to engines looks, but >> in >> > >> theory >> > >> > it’s just a matter of plumbing through explicitly deleted >> positions >> > >> (or, >> > >> > less ideally, inferring them from a sentinel value in the >> tuple). >> > >> > >> > >> > >> > >> > So far I’m leaning towards option 2, but we should develop some >> > >> > concreteness around how feasible it is for engines to produce the >> DVs on >> > >> > the column update. Again, should all be theoretically possible >> based off >> > >> > plumbing deleted positions; we shouldn't let implementations drive >> the >> > >> spec >> > >> > but I think sniff testing the practicality of it is well worth it >> to >> > >> make >> > >> > sure that restriction is reasonably implementable. >> > >> > >> > >> > Interested in hearing what others think about this one. >> > >> > >> > >> > >> > >> > Thanks, >> > >> > >> > >> > Amogh Jahagirdar >> > >> > >> > >> >> > > >> > >> >
