Hi all, Thanks for your input and for the discussion during this week’s sync meeting <https://youtu.be/PGx4-GKqm6c?si=m95DTzUnvCbS39HY>. Following up on that call, we have reached the following decisions:
*Column File Representation* We agreed to mandate a column file representation that includes values for deleted rows to align with the base file. We also decided to use NULL values for these deleted positions. As a result, every leaf column in a column update will be nullable, even if it was non-nullable in the base file. *Updates on Partition Fields* The partition tuple in the file metadata must always match the file's content. Updates are permitted, including moving a file to an unpartitioned state by unsetting its tuple. Next Steps - [Anurag]: Document the dense format and filler strategy requirements, along with recommendations for updating states in the spec. - [All]: Please review the initial PR <https://github.com/apache/iceberg/pull/16285> for column files strcuts. ~ Anurag Mantripragada On Sun, Jun 28, 2026 at 10:03 PM Anurag Mantripragada < [email protected]> wrote: > Hi all, > > The arguments for mandating the dense (positional) representation are > compelling, and I'm convinced. I'll add it as an agenda item for the next > sync to formally confirm and close this. If anyone has remaining objections > please raise them here during the upcoming sync. > > Thanks for all the input. > > ~ Anurag > > On Thu, Jun 25, 2026 at 3:23 PM Andrei Tserakhau via dev < > [email protected]> wrote: > >> > There is a discussion about column families that has been punted for >> later. Separate column files for column families can be a desired and >> optimal long-term state. >> Agree here. >> >> With column families, separate column files become an intentional >> long-term layout, not a transient overlay that compaction folds away. >> >> I'd treat column families as a separate effort, and I think it's the >> effort that actually motivates sparse. The column-update use case is >> whole-column refresh - that's clearly dense. The regime where sparse earns >> its place is long-lived files updated repeatedly in part, because there >> full-coverage rewrites get wasteful - and that regime is exactly column >> families. >> >> So the representation can track the feature instead of being decided up >> front for both: >> - column updates: whole-column refresh -> dense. >> - column families: persistent separate files, repeated partial updates -> >> where sparse earns its place, decided as part of that design. >> >> That keeps today's reader simple (positional, dense only), and it gives >> sparse a concrete trigger - it comes in with column families, as its own >> format update - instead of an open-ended "maybe later" that forces every >> reader to support it now. >> >> So still +1 on starting dense. I'd just frame sparse as part of the >> column families effort when we pick that back up, rather than a >> representation choice we have to settle today. >> >> Best, >> Andrei >> >> On Fri, Jun 26, 2026 at 12:08 AM Steven Wu <[email protected]> wrote: >> >>> > because column files are short-lived. Compaction rewrites them back >>> into the base files regularly, so there is no long-lived dense corpus >>> to migrate. >>> >>> I don't necessarily agree that column files are short lived. There is a >>> discussion about column families that has been punted for later. Separate >>> column files for column families can be a desired and optimal long-term >>> state. >>> >>> I would also favor starting with the dense representation (fillter >>> values for delete rows). >>> >>> >>> On Thu, Jun 25, 2026 at 2:58 PM Andrei Tserakhau via dev < >>> [email protected]> wrote: >>> >>>> +1 on picking dense as the single representation now, rather than >>>> leaving it up to the engine. >>>> >>>> The reason I'd mandate it, not just allow it, is the asymmetry. Dense >>>> is the special case of sparse, so mandating the special case is the >>>> smallest thing every reader has to implement: a positional substitution, no >>>> scatter, no merge-on-read stacks. And it covers the dominant workload >>>> directly - refreshing a whole column (a new column from an expression, or >>>> overwriting an existing column with new values like embeddings or model >>>> weights) is full-coverage by nature, not a point update to a few rows. >>>> >>>> Key thing here is that going dense now does not close the door on >>>> sparse. >>>> >>>> A sparse-capable reader is a superset of a dense one - it can read >>>> dense files too, since full coverage is just sparse with every position >>>> present. So adding sparse later is an additive format version: it widens >>>> the reader, it does not break existing dense files, and tables that never >>>> need sparse never pay for it. The reverse is not true. If we allow sparse >>>> now, every engine and client has to implement the harder merge-on-read path >>>> from day one, for row-level partial updates the current workloads are not >>>> asking for. >>>> >>>> On whether dense is a one-way door: I don't think it is, because column >>>> files are short-lived. Compaction rewrites them back into the base files >>>> regularly, so there is no long-lived dense corpus to migrate. If we add >>>> sparse later, old dense files age out through normal compaction, or we >>>> rewrite them - and because they are transient, that cost is bounded and >>>> amortized, not a table-wide migration. >>>> >>>> So my preference is to specify dense now and keep sparse as a >>>> documented future extension with its own format version, rather than >>>> leaving the representation unspecified. Leaving it open is the worst of the >>>> three: as Peter pointed out, it forces every reader to support sparse >>>> anyway, which is the exact cost we would be trying to defer. >>>> >>>> Best, >>>> Andrei >>>> >>>> On Thu, Jun 25, 2026 at 7:58 PM Steven Wu <[email protected]> wrote: >>>> >>>>> > We can, and in the PoC this is what we do, broadcast the "location >>>>> -> record count" mapping to the writers for this. >>>>> >>>>> I am wondering if column file generation usually needs to scan the >>>>> existing base files (or column files) anyway. Otherwise, a default value >>>>> column (with expressions) should probably be sufficient. So the writer >>>>> probably already has the data file metadata. >>>>> >>>>> Plus, carrying over additional contextual information (like manifest >>>>> file location and entry position) is very beneficial, as the snapshot >>>>> producer can generate manifest DVs efficiently without scanning manifest >>>>> files to locate the old TrackedFile entry to delete (maybe via manifest >>>>> DV). >>>>> >>>>> > Alternatively, when scanning inputs for the writers, we can also >>>>> query the '_deleted' metadata column. >>>>> >>>>> I agree; this is another nice way to solve this problem assuming the >>>>> base file or older column files need to be scanned. >>>>> >>>>> On Wed, Jun 24, 2026 at 11:15 PM Gábor Kaszab <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi All, >>>>>> >>>>>> I share Steven's opinion that the cons against the dense >>>>>> representation aren't that strong, and the implementation seems more >>>>>> straightforward across projects and languages, if we keep an invariant to >>>>>> have all the rows (even the deleted ones with auxiliary values) in the >>>>>> column file. >>>>>> >>>>>> 1) Off stats >>>>>> I can just +1 on the stats part. They can be "fixed" to not go off >>>>>> caused by the filler values, but the stats are off already anyway due to >>>>>> deletes, so not sure if this is something we want to fix. >>>>>> >>>>>> 2) More data due to filler values >>>>>> TLDR: there is no significant difference between sparse and dense >>>>>> in storage size >>>>>> >>>>>> The reason is the compression efficiency for the _pos column. I made >>>>>> some experiments on this front and encodings can help the dense >>>>>> representation. The more rows we delete, the more auxiliary values we >>>>>> have >>>>>> to use with the dense representation, this is true. On the other hand, >>>>>> the >>>>>> more rows we delete the worse the compression of the _pos column is for >>>>>> the >>>>>> sparse representation (assuming Parquet V2) due to holes in the sequence. >>>>>> The overhead of the missing positions for sparse seems to balance out >>>>>> the overhead of the presence of auxiliary values for dense. >>>>>> >>>>>> 3) We have to know the record count of the base files in the writer >>>>>> I don't think this is an available information now in the writer. We >>>>>> can, and in the PoC this is what we do, broadcast the "location -> record >>>>>> count" mapping to the writers for this. >>>>>> >>>>>> Alternatively, when scanning inputs for the writers, we can also >>>>>> query the '_deleted' metadata column. Using that we don't even have to >>>>>> broadcast the record counts. >>>>>> >>>>>> Summary: >>>>>> I think none of the cons for dense are deal breakers and I'm in favor >>>>>> of supporting a single representation. My preference is dense. >>>>>> >>>>>> Best Regards, >>>>>> Gabor >>>>>> >>>>>> Steven Wu <[email protected]> ezt írta (időpont: 2026. jún. 24., >>>>>> Sze, 22:59): >>>>>> >>>>>>> I agree with Peter's points here. While it seems flexible to have >>>>>>> both optioins, it essentially requires every engine/client to implement >>>>>>> the >>>>>>> more complex read of sparse representation. >>>>>>> >>>>>>> I want to revisit the cons that Anurag summarized for the option 1 >>>>>>> (filler values for deleted rows). To me, those arguments against filler >>>>>>> values seem relatively weak, and the pros (zero-copy stitching, simpler >>>>>>> reader implementation) outweigh the cons. >>>>>>> >>>>>>> > Filler values at deleted positions skew Parquet footer statistics >>>>>>> (null_count, avg_length) >>>>>>> >>>>>>> Writers can produce accurate statistics in the Iceberg metadata even >>>>>>> with filler values. I know the Java reference implementation currently >>>>>>> just >>>>>>> takes the column stats from the Parquet writer. But some writer >>>>>>> implementations may choose to produce accurate stats in this case. >>>>>>> >>>>>>> There was also a concern that differing column statistics between >>>>>>> Iceberg metadata and the Parquet footer, caused by DVs, could be >>>>>>> confusing. >>>>>>> I want to argue that this difference is actually reasonable. DV is a >>>>>>> table >>>>>>> level concept. With DVs, Iceberg metadata can have different and >>>>>>> adjusted >>>>>>> column stats compared to the Parquet footer. Parquet is not aware of >>>>>>> DVs, >>>>>>> and the Parquet footer only captures the stats for the content in the >>>>>>> physical file. >>>>>>> >>>>>>> Today, we already have inaccurate stats with DVs. It is not a >>>>>>> correctness problem, it may have a small performance impact on pruning. >>>>>>> Even if writer implementations do nothing special for column files, we >>>>>>> are >>>>>>> no worse off than today. >>>>>>> >>>>>>> > Writes slightly more data than necessary (filler values for >>>>>>> deleted rows) >>>>>>> >>>>>>> This depends on the percentage of deleted rows. Sparse >>>>>>> representation also has some small overhead for storing the encoded >>>>>>> positions (even with delta encoding). >>>>>>> >>>>>>> > Writer must know base_file.record_count to pad trailing deletions >>>>>>> (base file metadata already available during write planning) >>>>>>> >>>>>>> As already pointed out, the base file metadata already has the row >>>>>>> count. so it is not really a problem >>>>>>> >>>>>>> On Tue, Jun 23, 2026 at 9:05 AM Péter Váry < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> I still have concerns with this decision: >>>>>>>> > Implementation Details: Specific writer implementation details >>>>>>>> such as choosing between dense or sparse representations will be left >>>>>>>> to >>>>>>>> individual engines. >>>>>>>> > Specification Scope: The specification will not mandate these >>>>>>>> internal implementation choices, provided that engines adhere to >>>>>>>> writing >>>>>>>> the explicit *_pos* column. >>>>>>>> >>>>>>>> If we do not specify whether the representation should be dense or >>>>>>>> sparse, we are effectively requiring all engines to support the sparse >>>>>>>> representation, since the dense representation is just a special case >>>>>>>> of >>>>>>>> the sparse one. >>>>>>>> In practice, this means every implementation must be able to >>>>>>>> materialize a dense representation from the sparse form, similar to >>>>>>>> what >>>>>>>> the current Spark implementation does today. While this is certainly >>>>>>>> feasible, it introduces an additional step on the read path, which is >>>>>>>> often >>>>>>>> performance-sensitive. This concern has been raised consistently by >>>>>>>> representatives of other Iceberg implementations, and I have not heard >>>>>>>> a >>>>>>>> different perspective from them so far. >>>>>>>> >>>>>>>> That said, if the broader group is comfortable accepting this >>>>>>>> trade-off, I do not have any further objections to the proposal. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Peter >>>>>>>> >>>>>>>> Anurag Mantripragada <[email protected]> ezt írta >>>>>>>> (időpont: 2026. jún. 16., K, 20:51): >>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> It seems this thread has become conflated with the metadata >>>>>>>>> representation discussion >>>>>>>>> <https://lists.apache.org/thread/7jryw9dfvc02s411twn4o7s5gjrybfxg>. >>>>>>>>> While all the points raised here are noted, let’s continue those >>>>>>>>> specific >>>>>>>>> parts of the conversation in the metadata thread. >>>>>>>>> >>>>>>>>> Regarding data representation, we discussed the following during >>>>>>>>> this <https://www.youtube.com/watch?v=kuxFBm-j5hw&t=3s> sync: >>>>>>>>> >>>>>>>>> - Implementation Details: Specific writer implementation >>>>>>>>> details such as choosing between dense or sparse representations >>>>>>>>> will be >>>>>>>>> left to individual engines. >>>>>>>>> - Specification Scope: The specification will not mandate >>>>>>>>> these internal implementation choices, provided that engines >>>>>>>>> adhere to >>>>>>>>> writing the explicit *_pos* column. >>>>>>>>> >>>>>>>>> Please let me know if you have concerns. >>>>>>>>> >>>>>>>>> ~ Anurag >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Jun 2, 2026 at 11:44 AM Xiening Dai <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> We also need to think about the DV only case. >>>>>>>>>> >>>>>>>>>> If we have f0 with dv0, then we do column update and generate f1. >>>>>>>>>> Do we also bump the sequence number for f0 in this case? There are >>>>>>>>>> multiple >>>>>>>>>> options: >>>>>>>>>> >>>>>>>>>> 1) We bump the sequence number, then we will need to copy dv0 >>>>>>>>>> into dv1 and assign the same sequence number to dv1 so that the >>>>>>>>>> delete >>>>>>>>>> positions won't get lost. >>>>>>>>>> 2) We don't bump the sequence number, then we don't need to >>>>>>>>>> re-write dv0 and everything would remain working. But this creates a >>>>>>>>>> small >>>>>>>>>> inconsistency with eq delete case, and requires a special case >>>>>>>>>> handling at >>>>>>>>>> write path. >>>>>>>>>> 3) We bump sequence number for both data file f0, and dv0. We >>>>>>>>>> don't need to rewrite dv, but instead we bump the sequence number >>>>>>>>>> for the >>>>>>>>>> dv as well. >>>>>>>>>> >>>>>>>>>> I'd suggest we write down these details into a spec change >>>>>>>>>> proposal and examine the read write work flow carefully. >>>>>>>>>> >>>>>>>>>> On 2026/06/02 12:42:10 Gábor Kaszab wrote: >>>>>>>>>> > Thanks for the summary, Amogh! >>>>>>>>>> > >>>>>>>>>> > I think the missing building block to make this eq-delete >>>>>>>>>> rewrite work is >>>>>>>>>> > the decision made yesterday, to bump the base file-level >>>>>>>>>> sequence number >>>>>>>>>> > when adding a column file. With this, we can make sure that >>>>>>>>>> after we have >>>>>>>>>> > rewritten the eq-deletes into DVs in the process of adding >>>>>>>>>> column files, we >>>>>>>>>> > don't have to apply the eq-deletes we had previously on the >>>>>>>>>> base file. >>>>>>>>>> > >>>>>>>>>> > Just some thoughts on implementation: >>>>>>>>>> > >>>>>>>>>> > - Write path in general: When writing the update file, we >>>>>>>>>> designed this >>>>>>>>>> > in the PoC to receive _path and _pos from the base file. >>>>>>>>>> With this we can >>>>>>>>>> > identify if some positions are missing and we can convert >>>>>>>>>> them into DVs >>>>>>>>>> > - Trailing deletes: The tricky part is when trailing rows >>>>>>>>>> are deleted. I >>>>>>>>>> > see 2 approaches to get around this: >>>>>>>>>> > - Broadcast base file row counts to writers (this is done >>>>>>>>>> by the >>>>>>>>>> > PoC): When we received the last row from the base file >>>>>>>>>> with pos X, but we >>>>>>>>>> > know there are more rows in the base file, we have to add >>>>>>>>>> the trailing >>>>>>>>>> > positions to the DV >>>>>>>>>> > - Enrich the input rows fed to the writer with the >>>>>>>>>> "_deleted" >>>>>>>>>> > metadata column. False => write to update file, true => >>>>>>>>>> write pos to DV >>>>>>>>>> > >>>>>>>>>> > Regards, >>>>>>>>>> > Gabor >>>>>>>>>> > >>>>>>>>>> > Amogh Jahagirdar <[email protected]> ezt írta (időpont: 2026. >>>>>>>>>> jún. 1., H, >>>>>>>>>> > 22:48): >>>>>>>>>> > >>>>>>>>>> > > >The real challenge comes from the read path. In the case >>>>>>>>>> when we have a >>>>>>>>>> > > data file f0, an equality delete file d0, and column file f1, >>>>>>>>>> and the >>>>>>>>>> > > materialized dv d1. How do we reconcile the deletes during >>>>>>>>>> read? If we >>>>>>>>>> > > don't do anything special, following the existing spec (based >>>>>>>>>> on sequence >>>>>>>>>> > > number rule), we would apply d0 on f0, and then apply d1 on >>>>>>>>>> f1, which >>>>>>>>>> > > should still give us the correct results as both d0 and d1 >>>>>>>>>> represent the >>>>>>>>>> > > same set of positions. But this is undesired because we dont >>>>>>>>>> want to load >>>>>>>>>> > > and re-evaluate the old column values. So we need a change in >>>>>>>>>> the spec so >>>>>>>>>> > > that in this scenario the new d1 supersede the existing >>>>>>>>>> equality delete >>>>>>>>>> > > file (d0). >>>>>>>>>> > > >>>>>>>>>> > > So given the following invariants/rules: >>>>>>>>>> > > >>>>>>>>>> > > 1. In a dense representation, column updates must carry over >>>>>>>>>> all active >>>>>>>>>> > > values for the column (and there's a _pos column referencing >>>>>>>>>> the position >>>>>>>>>> > > from the original base file). >>>>>>>>>> > > 2. Column updates must know what rows were deleted (either to >>>>>>>>>> omit the row >>>>>>>>>> > > or materialize the default value) >>>>>>>>>> > > 3. Data sequence numbers are updated on column >>>>>>>>>> appends/updates (this would >>>>>>>>>> > > be a spec change in v4). I think reusing the same seq. number >>>>>>>>>> is key since >>>>>>>>>> > > we don't have a different sequence number definition that's >>>>>>>>>> temporal in >>>>>>>>>> > > dimension for delete matching and another one that's not >>>>>>>>>> temporal but for >>>>>>>>>> > > column updates. Having a single sequence number simplifies a >>>>>>>>>> lot of this. >>>>>>>>>> > > 4. The requirement that a column update must also rewrite >>>>>>>>>> existing >>>>>>>>>> > > equality deletes into DV >>>>>>>>>> > > >>>>>>>>>> > > I think this combination (and the fact that DVs are 1:1 to >>>>>>>>>> with data >>>>>>>>>> > > files) naturally addresses this because >>>>>>>>>> > > f1 in this example would have the column values for all the >>>>>>>>>> active rows. >>>>>>>>>> > > Then the DV v1 just deletes row positions as usual. There's >>>>>>>>>> never a need to >>>>>>>>>> > > actually read the old column values in this model. >>>>>>>>>> > > >>>>>>>>>> > > There's a broader discussion around eliminating new equality >>>>>>>>>> deletes in v4 >>>>>>>>>> > > but in that case this rule would still apply to handle older >>>>>>>>>> equality >>>>>>>>>> > > deletes from v3 and earlier + column updates on older data >>>>>>>>>> files as well. >>>>>>>>>> > > >>>>>>>>>> > > We actually talked about this a bit in todays v4 amt sync >>>>>>>>>> > > <https://youtu.be/7mVes-6pM1c?t=861> >>>>>>>>>> > > >>>>>>>>>> > > Thanks, >>>>>>>>>> > > Amogh Jahagirdar >>>>>>>>>> > > >>>>>>>>>> > > On Mon, Jun 1, 2026 at 12:17 PM Xiening Dai <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> > > >>>>>>>>>> > >> > but we should develop some concreteness around how >>>>>>>>>> feasible it is for >>>>>>>>>> > >> engines to produce the DVs on the column update. >>>>>>>>>> > >> >>>>>>>>>> > >> Actually I don't think this would be a problem. As >>>>>>>>>> mentioned, in order to >>>>>>>>>> > >> generate correct column file, we already need to product the >>>>>>>>>> correct set of >>>>>>>>>> > >> deleted positions, and we just need an extra step to >>>>>>>>>> materialize these >>>>>>>>>> > >> positions into DV. >>>>>>>>>> > >> >>>>>>>>>> > >> The real challenge comes from the read path. In the case >>>>>>>>>> when we have a >>>>>>>>>> > >> data file f0, an equality delete file d0, and column file >>>>>>>>>> f1, and the >>>>>>>>>> > >> materialized dv d1. How do we reconcile the deletes during >>>>>>>>>> read? If we >>>>>>>>>> > >> don't do anything special, following the existing spec >>>>>>>>>> (based on sequence >>>>>>>>>> > >> number rule), we would apply d0 on f0, and then apply d1 on >>>>>>>>>> f1, which >>>>>>>>>> > >> should still give us the correct results as both d0 and d1 >>>>>>>>>> represent the >>>>>>>>>> > >> same set of positions. But this is undesired because we dont >>>>>>>>>> want to load >>>>>>>>>> > >> and re-evaluate the old column values. So we need a change >>>>>>>>>> in the spec so >>>>>>>>>> > >> that in this scenario the new d1 supersede the existing >>>>>>>>>> equality delete >>>>>>>>>> > >> file (d0). >>>>>>>>>> > >> >>>>>>>>>> > >> On 2026/05/29 23:21:33 Amogh Jahagirdar wrote: >>>>>>>>>> > >> > One approach that’s helped me reason about all this is to >>>>>>>>>> treat each >>>>>>>>>> > >> base >>>>>>>>>> > >> > file as its own little mini‑table inside the larger table: >>>>>>>>>> the row >>>>>>>>>> > >> range of >>>>>>>>>> > >> > the base file keyed by row_id, and column files/deletes >>>>>>>>>> just layer on >>>>>>>>>> > >> top.Once >>>>>>>>>> > >> > a row is deleted in that mini‑table, it stays deleted in >>>>>>>>>> that >>>>>>>>>> > >> mini‑table’s >>>>>>>>>> > >> > state (whether that’s via equality deletes, or DVs), and >>>>>>>>>> column updates >>>>>>>>>> > >> are >>>>>>>>>> > >> > just layering changed or additional columns on top of >>>>>>>>>> whatever rowsare >>>>>>>>>> > >> > still there. Then I can reason about "what are desirable >>>>>>>>>> properties of >>>>>>>>>> > >> this >>>>>>>>>> > >> > mini-table". >>>>>>>>>> > >> > >>>>>>>>>> > >> > Once I look at it that way, stacking equality deletes with >>>>>>>>>> column >>>>>>>>>> > >> updates >>>>>>>>>> > >> > on the same column, and then forcing the write path to >>>>>>>>>> read all the >>>>>>>>>> > >> older >>>>>>>>>> > >> > column files when producing new column updates, feels like >>>>>>>>>> the worst >>>>>>>>>> > >> > outcome; and it gets worse the more column updates there >>>>>>>>>> are for the >>>>>>>>>> > >> > column. It blows up complexity and performance and >>>>>>>>>> compromises the >>>>>>>>>> > >> value of >>>>>>>>>> > >> > efficient column updates. >>>>>>>>>> > >> > >>>>>>>>>> > >> > If we eliminate that option, I think we’re left with two >>>>>>>>>> high‑level >>>>>>>>>> > >> > approaches: >>>>>>>>>> > >> > >>>>>>>>>> > >> > 1. Equality deletes cannot be allowed with column >>>>>>>>>> updates. This >>>>>>>>>> > >> > simplifies both the read and write paths when column >>>>>>>>>> update files are >>>>>>>>>> > >> > present. I would generally prefer this option but there >>>>>>>>>> is a >>>>>>>>>> > >> legitimate >>>>>>>>>> > >> > problem around the “how” for checking for the presence >>>>>>>>>> equality >>>>>>>>>> > >> deletes. We >>>>>>>>>> > >> > can’t rely on snapshot summaries, which means we’d have >>>>>>>>>> to look at >>>>>>>>>> > >> delete >>>>>>>>>> > >> > manifests to really know if equality deletes exist. >>>>>>>>>> There were ideas >>>>>>>>>> > >> in the >>>>>>>>>> > >> > V4 AMT sync about constraining equality deletes to be >>>>>>>>>> in the root >>>>>>>>>> > >> manifest; >>>>>>>>>> > >> > in that model, the amount of work needed to check for >>>>>>>>>> equality >>>>>>>>>> > >> deletes is >>>>>>>>>> > >> > bounded by the root size. I’d keep that as a separate >>>>>>>>>> open question >>>>>>>>>> > >> because >>>>>>>>>> > >> > there are other challenges with requiring equality >>>>>>>>>> deletes to only >>>>>>>>>> > >> appear >>>>>>>>>> > >> > in the root manifest, especially on the upgrade path. >>>>>>>>>> > >> > 2. After an equality delete, subsequent updates must >>>>>>>>>> produce a DV. As >>>>>>>>>> > >> > Xiening highlighted, once you’ve had an equality delete >>>>>>>>>> on a column, >>>>>>>>>> > >> any >>>>>>>>>> > >> > subsequent updates on that column would be required to >>>>>>>>>> produce a DV >>>>>>>>>> > >> (or >>>>>>>>>> > >> > positional delete) for the deleted positions at the new >>>>>>>>>> sequence >>>>>>>>>> > >> number, >>>>>>>>>> > >> > making the original equality delete obsolete. This is >>>>>>>>>> attractive >>>>>>>>>> > >> because >>>>>>>>>> > >> > it’s not too constraining for writers: they’re already >>>>>>>>>> doing the >>>>>>>>>> > >> work of >>>>>>>>>> > >> > reconciling deleted positions to decide what to write >>>>>>>>>> into the >>>>>>>>>> > >> column file, >>>>>>>>>> > >> > so the additional work is basically emitting the DV. >>>>>>>>>> The main thing >>>>>>>>>> > >> to >>>>>>>>>> > >> > think through is how exactly the plumbing to engines >>>>>>>>>> looks, but in >>>>>>>>>> > >> theory >>>>>>>>>> > >> > it’s just a matter of plumbing through explicitly >>>>>>>>>> deleted positions >>>>>>>>>> > >> (or, >>>>>>>>>> > >> > less ideally, inferring them from a sentinel value in >>>>>>>>>> the tuple). >>>>>>>>>> > >> > >>>>>>>>>> > >> > >>>>>>>>>> > >> > So far I’m leaning towards option 2, but we should develop >>>>>>>>>> some >>>>>>>>>> > >> > concreteness around how feasible it is for engines to >>>>>>>>>> produce the DVs on >>>>>>>>>> > >> > the column update. Again, should all be theoretically >>>>>>>>>> possible based off >>>>>>>>>> > >> > plumbing deleted positions; we shouldn't let >>>>>>>>>> implementations drive the >>>>>>>>>> > >> spec >>>>>>>>>> > >> > but I think sniff testing the practicality of it is well >>>>>>>>>> worth it to >>>>>>>>>> > >> make >>>>>>>>>> > >> > sure that restriction is reasonably implementable. >>>>>>>>>> > >> > >>>>>>>>>> > >> > Interested in hearing what others think about this one. >>>>>>>>>> > >> > >>>>>>>>>> > >> > >>>>>>>>>> > >> > Thanks, >>>>>>>>>> > >> > >>>>>>>>>> > >> > Amogh Jahagirdar >>>>>>>>>> > >> > >>>>>>>>>> > >> >>>>>>>>>> > > >>>>>>>>>> > >>>>>>>>>> >>>>>>>>>
