I think it is great to explore alternatives, but I still feel we shouldn't deprecate equality deletes until we have a clear path forward.
> On Nov 19, 2024, at 7:56 AM, Russell Spitzer <russell.spit...@gmail.com> > wrote: > > I'm strongly in favor of moving to the Delta + Base table approach discussed > in the cookbook above. I wonder if we should codify that into something more > standardized but it seems to me to be a much better path forward. I'm not > sure we need to support his at the spec level but it would be nice if we > could provide a table that automatically was broken into sub tables and had > well defined operations on it. > > For example: > > FastUpdateTable: > Requires: > Primary Key Columns > Long Max Delta Size > Contains: > Private Iceberg Table: Delta > Private Iceberg Table: Base > > On All Scans - > Return a view which joins delta and base on primary key, if Delta has > a record for a given primary key discard the base record > > On All Writes - > Perform all writes against the delta table, only MERGE is allowed. > Append is forbidden (No PK Guarantees) Only position deletes are allowed. > > On Delta Table Size Max Delta Size- - > Upsert DELTA into BASE > Clear upserted records from Delta > > > If the Delta Table size is kept small I think this would be almost as > performant as Equality deletes but still be compatible with row-lineage and > other indexing features. > > > On Tue, Nov 19, 2024 at 7:12 AM Manu Zhang <owenzhang1...@gmail.com > <mailto:owenzhang1...@gmail.com>> wrote: >> Hi Ajantha, >> >> I'm proposing exploring a view-based approach similar to the >> changelog-mirror table pattern[1] rather than supporting delta writers for >> Kafka connect Iceberg sink. >> >> 1. >> https://www.tabular.io/apache-iceberg-cookbook/data-engineering-cdc-table-mirroring/ >> >> On Tue, Nov 19, 2024 at 7:38 PM Jean-Baptiste Onofré <j...@nanthrax.net >> <mailto:j...@nanthrax.net>> wrote: >>> I don’t think it’s a problem while an alternative is explored (the JDK >>> itself does that pretty often). >>> So it’s up to the community: of course I’m against removing it without >>> solid alternative, but deprecation is fine imho. >>> >>> Regards >>> JB >>> >>> Le mar. 19 nov. 2024 à 12:19, Ajantha Bhat <ajanthab...@gmail.com >>> <mailto:ajanthab...@gmail.com>> a écrit : >>>>> - ok for deprecate equality deletes >>>>> - not ok to remove it >>>> >>>> @JB: I don't think it is a good idea to use deprecated functionality in >>>> the new feature development. >>>> Hence, my specific question was about kafka connect upsert operation. >>>> >>>> @Manu: I meant the delta writers for kafka connect Iceberg sink (which in >>>> turn used for upsetting the CDC records) >>>> https://github.com/apache/iceberg/issues/10842 >>>> >>>> >>>> - Ajantha >>>> >>>> >>>> >>>> On Tue, Nov 19, 2024 at 3:08 PM Manu Zhang <owenzhang1...@gmail.com >>>> <mailto:owenzhang1...@gmail.com>> wrote: >>>>> I second Anton's proposal to standardize on a view-based approach to >>>>> handle CDC cases. >>>>> Actually, it's already been explored in detail[1] by Jack before. >>>>> >>>>> [1] Improving Change Data Capture Use Case for Apache Iceberg >>>>> <https://docs.google.com/document/d/1kyyJp4masbd1FrIKUHF1ED_z1hTARL8bNoKCgb7fhSQ/edit?tab=t.0#heading=h.94xnx4qg3bnt> >>>>> >>>>> >>>>> On Tue, Nov 19, 2024 at 4:16 PM Jean-Baptiste Onofré <j...@nanthrax.net >>>>> <mailto:j...@nanthrax.net>> wrote: >>>>>> My proposal is the following (already expressed): >>>>>> - ok for deprecate equality deletes >>>>>> - not ok to remove it >>>>>> - work on position deletes improvements to address streaming use cases. >>>>>> I think we should explore different approaches. Personally I think a >>>>>> possible approach would be to find index way to data files to avoid full >>>>>> scan to find row position. >>>>>> >>>>>> My $0.01 :) >>>>>> >>>>>> Regards >>>>>> JB >>>>>> >>>>>> Le mar. 19 nov. 2024 à 07:53, Ajantha Bhat <ajanthab...@gmail.com >>>>>> <mailto:ajanthab...@gmail.com>> a écrit : >>>>>>> Hi, What's the conclusion on this thread? >>>>>>> >>>>>>> Users are looking for Upsert (CDC) support for OSS Iceberg kafka >>>>>>> connect sink. >>>>>>> We only support appends at the moment. Can we go ahead and implement >>>>>>> the upserts using equality deletes? >>>>>>> >>>>>>> >>>>>>> - Ajantha >>>>>>> >>>>>>> On Sun, Nov 10, 2024 at 11:56 AM Vignesh <vignesh.v...@gmail.com >>>>>>> <mailto:vignesh.v...@gmail.com>> wrote: >>>>>>>> Hi, >>>>>>>> I am reading about iceberg and am quite new to this. >>>>>>>> This puffin would be an index from key to data file. Other use cases >>>>>>>> of Puffin, such as statistics are at a per file level if I understand >>>>>>>> correctly. >>>>>>>> >>>>>>>> Where would the puffin about key->data file be stored? It is a >>>>>>>> property of the entire table. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Vignesh. >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Nov 9, 2024 at 2:17 AM Shani Elharrar >>>>>>>> <sh...@upsolver.com.invalid> wrote: >>>>>>>>> JB, this is what we do, we write Equality Deletes and periodically >>>>>>>>> convert them to Positional Deletes. >>>>>>>>> >>>>>>>>> We could probably index the keys, maybe partially index using bloom >>>>>>>>> filters, the best would be to put those bloom filters inside puffin. >>>>>>>>> >>>>>>>>> Shani. >>>>>>>>> >>>>>>>>>> On 9 Nov 2024, at 11:11, Jean-Baptiste Onofré <j...@nanthrax.net >>>>>>>>>> <mailto:j...@nanthrax.net>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I agree with Peter here, and I would say that it would be an issue >>>>>>>>>> for multi-engine support. >>>>>>>>>> >>>>>>>>>> I think, as I already mentioned with others, we should explore an >>>>>>>>>> alternative. >>>>>>>>>> As the main issue is the datafile scan in streaming context, maybe >>>>>>>>>> we could find a way to "index"/correlate for positional deletes with >>>>>>>>>> limited scanning. >>>>>>>>>> I will think again about that :) >>>>>>>>>> >>>>>>>>>> Regards >>>>>>>>>> JB >>>>>>>>>> >>>>>>>>>> On Sat, Nov 9, 2024 at 6:48 AM Péter Váry >>>>>>>>>> <peter.vary.apa...@gmail.com <mailto:peter.vary.apa...@gmail.com>> >>>>>>>>>> wrote: >>>>>>>>>>> Hi Imran, >>>>>>>>>>> >>>>>>>>>>> I don't think it's a good idea to start creating multiple types of >>>>>>>>>>> Iceberg tables. Iceberg's main selling point is compatibility >>>>>>>>>>> between engines. If we don't have readers and writers for all types >>>>>>>>>>> of tables, then we remove compatibility from the equation and >>>>>>>>>>> engine specific formats always win. OTOH, if we write readers and >>>>>>>>>>> writers for all types of tables then we are back on square one. >>>>>>>>>>> >>>>>>>>>>> Identifier fields are a table schema concept and used in many cases >>>>>>>>>>> during query planning and execution. This is why they are defined >>>>>>>>>>> as part of the SQL spec, and this is why Iceberg defines them as >>>>>>>>>>> well. One use case is where they can be used to merge deletes >>>>>>>>>>> (independently of how they are manifested) and subsequent inserts, >>>>>>>>>>> into updates. >>>>>>>>>>> >>>>>>>>>>> Flink SQL doesn't allow creating tables with partition transforms, >>>>>>>>>>> so no new table could be created by Flink SQL using transforms, but >>>>>>>>>>> tables created by other engines could still be used (both read an >>>>>>>>>>> write). Also you can create such tables in Flink using the Java API. >>>>>>>>>>> >>>>>>>>>>> Requiring partition columns be part of the identifier fields is >>>>>>>>>>> coming from the practical consideration, that you want to limit the >>>>>>>>>>> scope of the equality deletes as much as possible. Otherwise all of >>>>>>>>>>> the equality deletes should be table global, and they should be >>>>>>>>>>> read by every reader. We could write those, we just decided that we >>>>>>>>>>> don't want to allow the user to do this, as it is most cases a bad >>>>>>>>>>> idea. >>>>>>>>>>> >>>>>>>>>>> I hope this helps, >>>>>>>>>>> Peter >>>>>>>>>>> >>>>>>>>>>> On Fri, Nov 8, 2024, 22:01 Imran Rashid >>>>>>>>>>> <iras...@cloudera.com.invalid> wrote: >>>>>>>>>>>> I'm not down in the weeds at all myself on implementation details, >>>>>>>>>>>> so forgive me if I'm wrong about the details here. >>>>>>>>>>>> >>>>>>>>>>>> I can see all the viewpoints -- both that equality deletes enable >>>>>>>>>>>> some use cases, but also make others far more difficult. What >>>>>>>>>>>> surprised me the most is that Iceberg does not provide a way to >>>>>>>>>>>> distinguish these two table "types". >>>>>>>>>>>> >>>>>>>>>>>> At first, I thought the presence of an identifier-field >>>>>>>>>>>> (https://iceberg.apache.org/spec/#identifier-field-ids) indicated >>>>>>>>>>>> that the table was a target for equality deletes. But, then it >>>>>>>>>>>> turns out identifier-fields are also useful for changelog views >>>>>>>>>>>> even without equality deletes -- IIUC, they show that a delete + >>>>>>>>>>>> insert should actually be interpreted as an update in changelog >>>>>>>>>>>> view. >>>>>>>>>>>> >>>>>>>>>>>> To be perfectly honest, I'm confused about all of these details -- >>>>>>>>>>>> from my read, the spec does not indicate this relationship between >>>>>>>>>>>> identifier-fields and equality_ids in equality delete files >>>>>>>>>>>> (https://iceberg.apache.org/spec/#equality-delete-files), but I >>>>>>>>>>>> think that is the way Flink works. Flink itself seems to have >>>>>>>>>>>> even more limitations -- no partition transforms are allowed, and >>>>>>>>>>>> all partition columns must be a subset of the identifier fields. >>>>>>>>>>>> Is that just a Flink limitation, or is that the intended behavior >>>>>>>>>>>> in the spec? (Or maybe user-error on my part?) Those seem like >>>>>>>>>>>> very reasonable limitations, from an implementation point-of-view. >>>>>>>>>>>> But OTOH, as a user, this seems to be directly contrary to some >>>>>>>>>>>> of the promises of Iceberg. >>>>>>>>>>>> >>>>>>>>>>>> Its easy to see if a table already has equality deletes in it, by >>>>>>>>>>>> looking at the metadata. But is there any way to indicate that a >>>>>>>>>>>> table (or branch of a table) _must not_ have equality deletes >>>>>>>>>>>> added to it? >>>>>>>>>>>> >>>>>>>>>>>> If that were possible, it seems like we could support both use >>>>>>>>>>>> cases. We could continue to optimize for the streaming ingestion >>>>>>>>>>>> use cases using equality deletes. But we could also build more >>>>>>>>>>>> optimizations into the "non-streaming-ingestion" branches. And we >>>>>>>>>>>> could document the tradeoff so it is much clearer to end users. >>>>>>>>>>>> >>>>>>>>>>>> To maintain compatibility, I suppose that the change would be that >>>>>>>>>>>> equality deletes continue to be allowed by default, but we'd add a >>>>>>>>>>>> new field to indicate that for some tables (or branches of a >>>>>>>>>>>> table), equality deletes would not be allowed. And it would be an >>>>>>>>>>>> error for an engine to make an update which added an equality >>>>>>>>>>>> delete to such a table. >>>>>>>>>>>> >>>>>>>>>>>> Maybe that change would even be possible in V3. >>>>>>>>>>>> >>>>>>>>>>>> And if all the performance improvements to equality deletes make >>>>>>>>>>>> this a moot point, we could drop the field in v4. But it seems >>>>>>>>>>>> like a mistake to both limit the non-streaming use-case AND have >>>>>>>>>>>> confusing limitations for the end-user in the meantime. >>>>>>>>>>>> >>>>>>>>>>>> I would happily be corrected about my understanding of all of the >>>>>>>>>>>> above. >>>>>>>>>>>> >>>>>>>>>>>> thanks! >>>>>>>>>>>> Imran >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Nov 5, 2024 at 9:16 AM Bryan Keller <brya...@gmail.com >>>>>>>>>>>> <mailto:brya...@gmail.com>> wrote: >>>>>>>>>>>>> I also feel we should keep equality deletes until we have an >>>>>>>>>>>>> alternative solution for streaming updates/deletes. >>>>>>>>>>>>> >>>>>>>>>>>>> -Bryan >>>>>>>>>>>>> >>>>>>>>>>>>>> On Nov 4, 2024, at 8:33 AM, Péter Váry >>>>>>>>>>>>>> <peter.vary.apa...@gmail.com >>>>>>>>>>>>>> <mailto:peter.vary.apa...@gmail.com>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Well, it seems like I'm a little late, so most of the arguments >>>>>>>>>>>>>> are voiced. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I agree that we should not deprecate the equality deletes until >>>>>>>>>>>>>> we have a replacement feature. >>>>>>>>>>>>>> I think one of the big advantages of Iceberg is that it supports >>>>>>>>>>>>>> batch processing and streaming ingestion too. >>>>>>>>>>>>>> For streaming ingestion we need a way to update existing data in >>>>>>>>>>>>>> a performant way, but restricting deletes for the primary keys >>>>>>>>>>>>>> seems like enough from the streaming perspective. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Equality deletes allow a very wide range of applications, which >>>>>>>>>>>>>> we might be able to narrow down a bit, but still keep useful. So >>>>>>>>>>>>>> if we want to go down this road, we need to start collecting the >>>>>>>>>>>>>> requirements. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Peter >>>>>>>>>>>>>> >>>>>>>>>>>>>> Shani Elharrar <sh...@upsolver.com.invalid> ezt írta (időpont: >>>>>>>>>>>>>> 2024. nov. 1., P, 19:22): >>>>>>>>>>>>>>> I understand how it makes sense for batch jobs, but it damages >>>>>>>>>>>>>>> stream jobs, using equality deletes works much better for >>>>>>>>>>>>>>> streaming (which have a strict SLA for delays), and in order to >>>>>>>>>>>>>>> decrease the performance penalty - systems can rewrite the >>>>>>>>>>>>>>> equality deletes to positional deletes. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Shani. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 1 Nov 2024, at 20:06, Steven Wu <stevenz...@gmail.com >>>>>>>>>>>>>>>> <mailto:stevenz...@gmail.com>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Fundamentally, it is very difficult to write position deletes >>>>>>>>>>>>>>>> with concurrent writers and conflicts for batch jobs too, as >>>>>>>>>>>>>>>> the inverted index may become invalid/stale. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The position deletes are created during the write phase. But >>>>>>>>>>>>>>>> conflicts are only detected at the commit stage. I assume the >>>>>>>>>>>>>>>> batch job should fail in this case. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Nov 1, 2024 at 10:57 AM Steven Wu >>>>>>>>>>>>>>>> <stevenz...@gmail.com <mailto:stevenz...@gmail.com>> wrote: >>>>>>>>>>>>>>>>> Shani, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> That is a good point. It is certainly a limitation for the >>>>>>>>>>>>>>>>> Flink job to track the inverted index internally (which is >>>>>>>>>>>>>>>>> what I had in mind). It can't be shared/synchronized with >>>>>>>>>>>>>>>>> other Flink jobs or other engines writing to the same table. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> Steven >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, Nov 1, 2024 at 10:50 AM Shani Elharrar >>>>>>>>>>>>>>>>> <sh...@upsolver.com.invalid> wrote: >>>>>>>>>>>>>>>>>> Even if Flink can create this state, it would have to be >>>>>>>>>>>>>>>>>> maintained against the Iceberg table, we wouldn't like >>>>>>>>>>>>>>>>>> duplicates (keys) if other systems / users update the table >>>>>>>>>>>>>>>>>> (e.g manual insert / updates using DML). >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Shani. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On 1 Nov 2024, at 18:32, Steven Wu <stevenz...@gmail.com >>>>>>>>>>>>>>>>>>> <mailto:stevenz...@gmail.com>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> > Add support for inverted indexes to reduce the cost of >>>>>>>>>>>>>>>>>>> > position lookup. This is fairly tricky to implement for >>>>>>>>>>>>>>>>>>> > streaming use cases without an external system. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Anton, that is also what I was saying earlier. In Flink, >>>>>>>>>>>>>>>>>>> the inverted index of (key, committed data files) can be >>>>>>>>>>>>>>>>>>> tracked in Flink state. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Fri, Nov 1, 2024 at 2:16 AM Anton Okolnychyi >>>>>>>>>>>>>>>>>>> <aokolnyc...@gmail.com <mailto:aokolnyc...@gmail.com>> >>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>> I was a bit skeptical when we were adding equality >>>>>>>>>>>>>>>>>>>> deletes, but nothing beats their performance during >>>>>>>>>>>>>>>>>>>> writes. We have to find an alternative before deprecating. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> We are doing a lot of work to improve streaming, like >>>>>>>>>>>>>>>>>>>> reducing the cost of commits, enabling a large >>>>>>>>>>>>>>>>>>>> (potentially infinite) number of snapshots, changelog >>>>>>>>>>>>>>>>>>>> reads, and so on. It is a project goal to excel in >>>>>>>>>>>>>>>>>>>> streaming. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I was going to focus on equality deletes after completing >>>>>>>>>>>>>>>>>>>> the DV work. I believe we have these options: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> - Revisit the existing design of equality deletes (e.g. >>>>>>>>>>>>>>>>>>>> add more restrictions, improve compaction, offer new >>>>>>>>>>>>>>>>>>>> writers). >>>>>>>>>>>>>>>>>>>> - Standardize on the view-based approach [1] to handle >>>>>>>>>>>>>>>>>>>> streaming upserts and CDC use cases, potentially making >>>>>>>>>>>>>>>>>>>> this part of the spec. >>>>>>>>>>>>>>>>>>>> - Add support for inverted indexes to reduce the cost of >>>>>>>>>>>>>>>>>>>> position lookup. This is fairly tricky to implement for >>>>>>>>>>>>>>>>>>>> streaming use cases without an external system. Our >>>>>>>>>>>>>>>>>>>> runtime filtering in Spark today is equivalent to looking >>>>>>>>>>>>>>>>>>>> up positions in an inverted index represented by another >>>>>>>>>>>>>>>>>>>> Iceberg table. That may still not be enough for some >>>>>>>>>>>>>>>>>>>> streaming use cases. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> [1] - https://www.tabular.io/blog/hello-world-of-cdc/ >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> - Anton >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> чт, 31 жовт. 2024 р. о 21:31 Micah Kornfield >>>>>>>>>>>>>>>>>>>> <emkornfi...@gmail.com <mailto:emkornfi...@gmail.com>> >>>>>>>>>>>>>>>>>>>> пише: >>>>>>>>>>>>>>>>>>>>> I agree that equality deletes have their place in >>>>>>>>>>>>>>>>>>>>> streaming. I think the ultimate decision here is how >>>>>>>>>>>>>>>>>>>>> opinionated Iceberg wants to be on its use-cases. If it >>>>>>>>>>>>>>>>>>>>> really wants to stick to its origins of "slow moving >>>>>>>>>>>>>>>>>>>>> data", then removing equality deletes would be inline >>>>>>>>>>>>>>>>>>>>> with this. I think the other high level question is how >>>>>>>>>>>>>>>>>>>>> much we allow for partially compatible features (the row >>>>>>>>>>>>>>>>>>>>> lineage use-case feature was explicitly approved >>>>>>>>>>>>>>>>>>>>> excluding equality deletes, and people seemed OK with it >>>>>>>>>>>>>>>>>>>>> at the time. If all features need to work together, then >>>>>>>>>>>>>>>>>>>>> maybe we need to rethink the design here so it can be >>>>>>>>>>>>>>>>>>>>> forward compatible with equality deletes). >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I think one issue with equality deletes as stated in the >>>>>>>>>>>>>>>>>>>>> spec is that they are overly broad. I'd be interested if >>>>>>>>>>>>>>>>>>>>> people have any use cases that differ, but I think one >>>>>>>>>>>>>>>>>>>>> way of narrowing (and probably a necessary building block >>>>>>>>>>>>>>>>>>>>> for building something better) the specification scope >>>>>>>>>>>>>>>>>>>>> on equality deletes is to focus on upsert/Streaming >>>>>>>>>>>>>>>>>>>>> deletes. Two proposals in this regard are: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 1. Require that equality deletes can only correspond to >>>>>>>>>>>>>>>>>>>>> unique identifiers for the table. >>>>>>>>>>>>>>>>>>>>> 2. Consider requiring that for equality deletes on >>>>>>>>>>>>>>>>>>>>> partitioned tables, that the primary key must contain a >>>>>>>>>>>>>>>>>>>>> partition column (I believe Flink at least already does >>>>>>>>>>>>>>>>>>>>> this). It is less clear to me that this would meet all >>>>>>>>>>>>>>>>>>>>> existing use-cases. But having this would allow for >>>>>>>>>>>>>>>>>>>>> better incremental data-structures, which could then be >>>>>>>>>>>>>>>>>>>>> partition based. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Narrow scope to unique identifiers would allow for >>>>>>>>>>>>>>>>>>>>> further building blocks already mentioned, like a >>>>>>>>>>>>>>>>>>>>> secondary index (possible via LSM tree), that would allow >>>>>>>>>>>>>>>>>>>>> for better performance overall. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I generally agree with the sentiment that we shouldn't >>>>>>>>>>>>>>>>>>>>> deprecate them until there is a viable replacement. With >>>>>>>>>>>>>>>>>>>>> all due respect to my employer, let's not fall into the >>>>>>>>>>>>>>>>>>>>> Google trap [1] :) >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>>> Micah >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> [1] https://goomics.net/50/ >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 12:35 PM Alexander Jo >>>>>>>>>>>>>>>>>>>>> <alex...@starburstdata.com >>>>>>>>>>>>>>>>>>>>> <mailto:alex...@starburstdata.com>> wrote: >>>>>>>>>>>>>>>>>>>>>> Hey all, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Just to throw my 2 cents in, I agree with Steven and >>>>>>>>>>>>>>>>>>>>>> others that we do need some kind of replacement before >>>>>>>>>>>>>>>>>>>>>> deprecating equality deletes. >>>>>>>>>>>>>>>>>>>>>> They certainly have their problems, and do significantly >>>>>>>>>>>>>>>>>>>>>> increase complexity as they are now, but the writing of >>>>>>>>>>>>>>>>>>>>>> position deletes is too expensive for certain pipelines. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> We've been investigating using equality deletes for some >>>>>>>>>>>>>>>>>>>>>> of our workloads at Starburst, the key advantage we were >>>>>>>>>>>>>>>>>>>>>> hoping to leverage is cheap, effectively random access >>>>>>>>>>>>>>>>>>>>>> lookup deletes. >>>>>>>>>>>>>>>>>>>>>> Say you have a UUID column that's unique in a table and >>>>>>>>>>>>>>>>>>>>>> want to delete a row by UUID. With position deletes each >>>>>>>>>>>>>>>>>>>>>> delete is expensive without an index on that UUID. >>>>>>>>>>>>>>>>>>>>>> With equality deletes each delete is cheap and while >>>>>>>>>>>>>>>>>>>>>> reads/compaction is expensive but when updates are >>>>>>>>>>>>>>>>>>>>>> frequent and reads are sporadic that's a reasonable >>>>>>>>>>>>>>>>>>>>>> tradeoff. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Pretty much what Jason and Steven have already said. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Maybe there are some incremental improvements on >>>>>>>>>>>>>>>>>>>>>> equality deletes or tips from similar systems that might >>>>>>>>>>>>>>>>>>>>>> alleviate some of their problems? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> - Alex Jo >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 10:58 AM Steven Wu >>>>>>>>>>>>>>>>>>>>>> <stevenz...@gmail.com <mailto:stevenz...@gmail.com>> >>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>> We probably all agree with the downside of equality >>>>>>>>>>>>>>>>>>>>>>> deletes: it postpones all the work on the read path. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> In theory, we can implement position deletes only in >>>>>>>>>>>>>>>>>>>>>>> the Flink streaming writer. It would require the >>>>>>>>>>>>>>>>>>>>>>> tracking of last committed data files per key, which >>>>>>>>>>>>>>>>>>>>>>> can be stored in Flink state (checkpointed). This is >>>>>>>>>>>>>>>>>>>>>>> obviously quite expensive/challenging, but possible. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I like to echo one benefit of equality deletes that >>>>>>>>>>>>>>>>>>>>>>> Russel called out in the original email. Equality >>>>>>>>>>>>>>>>>>>>>>> deletes would never have conflicts. that is important >>>>>>>>>>>>>>>>>>>>>>> for streaming writers (Flink, Kafka connect, ...) that >>>>>>>>>>>>>>>>>>>>>>> commit frequently (minutes or less). Assume Flink can >>>>>>>>>>>>>>>>>>>>>>> write position deletes only and commit every 2 minutes. >>>>>>>>>>>>>>>>>>>>>>> The long-running nature of streaming jobs can cause >>>>>>>>>>>>>>>>>>>>>>> frequent commit conflicts with background delete >>>>>>>>>>>>>>>>>>>>>>> compaction jobs. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Overall, the streaming upsert write is not a well >>>>>>>>>>>>>>>>>>>>>>> solved problem in Iceberg. This probably affects all >>>>>>>>>>>>>>>>>>>>>>> streaming engines (Flink, Kafka connect, Spark >>>>>>>>>>>>>>>>>>>>>>> streaming, ...). We need to come up with some better >>>>>>>>>>>>>>>>>>>>>>> alternatives before we can deprecate equality deletes. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 8:38 AM Russell Spitzer >>>>>>>>>>>>>>>>>>>>>>> <russell.spit...@gmail.com >>>>>>>>>>>>>>>>>>>>>>> <mailto:russell.spit...@gmail.com>> wrote: >>>>>>>>>>>>>>>>>>>>>>>> For users of Equality Deletes, what are the key >>>>>>>>>>>>>>>>>>>>>>>> benefits to Equality Deletes that you would like to >>>>>>>>>>>>>>>>>>>>>>>> preserve and could you please share some concrete >>>>>>>>>>>>>>>>>>>>>>>> examples of the queries you want to run (and the >>>>>>>>>>>>>>>>>>>>>>>> schemas and data sizes you would like to run them >>>>>>>>>>>>>>>>>>>>>>>> against) and the latencies that would be acceptable? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 10:05 AM Jason Fine >>>>>>>>>>>>>>>>>>>>>>>> <ja...@upsolver.com.invalid> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Representing Upsolver here, we also make use of >>>>>>>>>>>>>>>>>>>>>>>>> Equality Deletes to deliver high frequency low >>>>>>>>>>>>>>>>>>>>>>>>> latency updates to our clients at scale. We have >>>>>>>>>>>>>>>>>>>>>>>>> customers using them at scale and demonstrating the >>>>>>>>>>>>>>>>>>>>>>>>> need and viability. We automate the process of >>>>>>>>>>>>>>>>>>>>>>>>> converting them into positional deletes (or fully >>>>>>>>>>>>>>>>>>>>>>>>> applying them) for more efficient engine queries in >>>>>>>>>>>>>>>>>>>>>>>>> the background giving our users both low latency and >>>>>>>>>>>>>>>>>>>>>>>>> good query performance. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Equality Deletes were added since there isn't a good >>>>>>>>>>>>>>>>>>>>>>>>> way to solve frequent updates otherwise. It would >>>>>>>>>>>>>>>>>>>>>>>>> require some sort of index keeping track of every >>>>>>>>>>>>>>>>>>>>>>>>> record in the table (by a predetermined PK) and >>>>>>>>>>>>>>>>>>>>>>>>> maintaining such an index is a huge task that every >>>>>>>>>>>>>>>>>>>>>>>>> tool interested in this would need to re-implement. >>>>>>>>>>>>>>>>>>>>>>>>> It also becomes a bottleneck limiting table sizes. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I don't think they should be removed without >>>>>>>>>>>>>>>>>>>>>>>>> providing an alternative. Positional Deletes have a >>>>>>>>>>>>>>>>>>>>>>>>> different performance profile inherently, requiring >>>>>>>>>>>>>>>>>>>>>>>>> more upfront work proportional to the table size. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 2:45 PM Jean-Baptiste Onofré >>>>>>>>>>>>>>>>>>>>>>>>> <j...@nanthrax.net <mailto:j...@nanthrax.net>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> Hi Russell >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the nice writeup and the proposal. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> I agree with your analysis, and I have the same >>>>>>>>>>>>>>>>>>>>>>>>>> feeling. However, I >>>>>>>>>>>>>>>>>>>>>>>>>> think there are more than Flink that write equality >>>>>>>>>>>>>>>>>>>>>>>>>> delete files. So, >>>>>>>>>>>>>>>>>>>>>>>>>> I agree to deprecate in V3, but maybe be more >>>>>>>>>>>>>>>>>>>>>>>>>> "flexible" about removal >>>>>>>>>>>>>>>>>>>>>>>>>> in V4 in order to give time to engines to update. >>>>>>>>>>>>>>>>>>>>>>>>>> I think that by deprecating equality deletes, we are >>>>>>>>>>>>>>>>>>>>>>>>>> clearly focusing >>>>>>>>>>>>>>>>>>>>>>>>>> on read performance and "consistency" (more than >>>>>>>>>>>>>>>>>>>>>>>>>> write). It's not >>>>>>>>>>>>>>>>>>>>>>>>>> necessarily a bad thing but the streaming platform >>>>>>>>>>>>>>>>>>>>>>>>>> and data ingestion >>>>>>>>>>>>>>>>>>>>>>>>>> platforms will be probably concerned about that (by >>>>>>>>>>>>>>>>>>>>>>>>>> using positional >>>>>>>>>>>>>>>>>>>>>>>>>> deletes, they will have to scan/read all datafiles >>>>>>>>>>>>>>>>>>>>>>>>>> to find the >>>>>>>>>>>>>>>>>>>>>>>>>> position, so painful). >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> So, to summarize: >>>>>>>>>>>>>>>>>>>>>>>>>> 1. Agree to deprecate equality deletes, but -1 to >>>>>>>>>>>>>>>>>>>>>>>>>> commit any target >>>>>>>>>>>>>>>>>>>>>>>>>> for deletion before having a clear path for >>>>>>>>>>>>>>>>>>>>>>>>>> streaming platforms >>>>>>>>>>>>>>>>>>>>>>>>>> (Flink, Beam, ...) >>>>>>>>>>>>>>>>>>>>>>>>>> 2. In the meantime (during the deprecation period), >>>>>>>>>>>>>>>>>>>>>>>>>> I propose to >>>>>>>>>>>>>>>>>>>>>>>>>> explore possible improvements for streaming >>>>>>>>>>>>>>>>>>>>>>>>>> platforms (maybe finding a >>>>>>>>>>>>>>>>>>>>>>>>>> way to avoid full data files scan, ...) >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks ! >>>>>>>>>>>>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>>>>>>>>>>>> JB >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 30, 2024 at 10:06 PM Russell Spitzer >>>>>>>>>>>>>>>>>>>>>>>>>> <russell.spit...@gmail.com >>>>>>>>>>>>>>>>>>>>>>>>>> <mailto:russell.spit...@gmail.com>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > Background: >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > 1) Position Deletes >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > Writers determine what rows are deleted and mark >>>>>>>>>>>>>>>>>>>>>>>>>> > them in a 1 for 1 representation. With delete >>>>>>>>>>>>>>>>>>>>>>>>>> > vectors this means every data file has at most 1 >>>>>>>>>>>>>>>>>>>>>>>>>> > delete vector that it is read in conjunction with >>>>>>>>>>>>>>>>>>>>>>>>>> > to excise deleted rows. Reader overhead is more or >>>>>>>>>>>>>>>>>>>>>>>>>> > less constant and is very predictable. >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > The main cost of this mode is that deletes must be >>>>>>>>>>>>>>>>>>>>>>>>>> > determined at write time which is expensive and >>>>>>>>>>>>>>>>>>>>>>>>>> > can be more difficult for conflict resolution >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > 2) Equality Deletes >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > Writers write out reference to what values are >>>>>>>>>>>>>>>>>>>>>>>>>> > deleted (in a partition or globally). There can be >>>>>>>>>>>>>>>>>>>>>>>>>> > an unlimited number of equality deletes and they >>>>>>>>>>>>>>>>>>>>>>>>>> > all must be checked for every data file that is >>>>>>>>>>>>>>>>>>>>>>>>>> > read. The cost of determining deleted rows is >>>>>>>>>>>>>>>>>>>>>>>>>> > essentially given to the reader. >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > Conflicts almost never happen since data files are >>>>>>>>>>>>>>>>>>>>>>>>>> > not actually changed and there is almost no cost >>>>>>>>>>>>>>>>>>>>>>>>>> > to the writer to generate these. Almost all costs >>>>>>>>>>>>>>>>>>>>>>>>>> > related to equality deletes are passed on to the >>>>>>>>>>>>>>>>>>>>>>>>>> > reader. >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > Proposal: >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > Equality deletes are, in my opinion, unsustainable >>>>>>>>>>>>>>>>>>>>>>>>>> > and we should work on deprecating and removing >>>>>>>>>>>>>>>>>>>>>>>>>> > them from the specification. At this time, I know >>>>>>>>>>>>>>>>>>>>>>>>>> > of only one engine (Apache Flink) which produces >>>>>>>>>>>>>>>>>>>>>>>>>> > these deletes but almost all engines have >>>>>>>>>>>>>>>>>>>>>>>>>> > implementations to read them. The cost of >>>>>>>>>>>>>>>>>>>>>>>>>> > implementing equality deletes on the read path is >>>>>>>>>>>>>>>>>>>>>>>>>> > difficult and unpredictable in terms of memory >>>>>>>>>>>>>>>>>>>>>>>>>> > usage and compute complexity. We’ve had >>>>>>>>>>>>>>>>>>>>>>>>>> > suggestions of implementing rocksdb inorder to >>>>>>>>>>>>>>>>>>>>>>>>>> > handle ever growing sets of equality deletes which >>>>>>>>>>>>>>>>>>>>>>>>>> > in my opinion shows that we are going down the >>>>>>>>>>>>>>>>>>>>>>>>>> > wrong path. >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > Outside of performance, Equality deletes are also >>>>>>>>>>>>>>>>>>>>>>>>>> > difficult to use in conjunction with many other >>>>>>>>>>>>>>>>>>>>>>>>>> > features. For example, any features requiring CDC >>>>>>>>>>>>>>>>>>>>>>>>>> > or Row lineage are basically impossible when >>>>>>>>>>>>>>>>>>>>>>>>>> > equality deletes are in use. When Equality deletes >>>>>>>>>>>>>>>>>>>>>>>>>> > are present, the state of the table can only be >>>>>>>>>>>>>>>>>>>>>>>>>> > determined with a full scan making it difficult to >>>>>>>>>>>>>>>>>>>>>>>>>> > update differential structures. This means >>>>>>>>>>>>>>>>>>>>>>>>>> > materialized views or indexes need to essentially >>>>>>>>>>>>>>>>>>>>>>>>>> > be fully rebuilt whenever an equality delete is >>>>>>>>>>>>>>>>>>>>>>>>>> > added to the table. >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > Equality deletes essentially remove complexity >>>>>>>>>>>>>>>>>>>>>>>>>> > from the write side but then add what I believe is >>>>>>>>>>>>>>>>>>>>>>>>>> > an unacceptable level of complexity to the read >>>>>>>>>>>>>>>>>>>>>>>>>> > side. >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > Because of this I suggest we deprecate Equality >>>>>>>>>>>>>>>>>>>>>>>>>> > Deletes in V3 and slate them for full removal from >>>>>>>>>>>>>>>>>>>>>>>>>> > the Iceberg Spec in V4. >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > I know this is a big change and compatibility >>>>>>>>>>>>>>>>>>>>>>>>>> > breakage so I would like to introduce this idea to >>>>>>>>>>>>>>>>>>>>>>>>>> > the community and solicit feedback from all >>>>>>>>>>>>>>>>>>>>>>>>>> > stakeholders. I am very flexible on this issue and >>>>>>>>>>>>>>>>>>>>>>>>>> > would like to hear the best issues both for and >>>>>>>>>>>>>>>>>>>>>>>>>> > against removal of Equality Deletes. >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > Thanks everyone for your time, >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > Russ Spitzer >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Jason Fine >>>>>>>>>>>>>>>>>>>>>>>>> Chief Software Architect >>>>>>>>>>>>>>>>>>>>>>>>> ja...@upsolver.com <mailto:ja...@upsolver.com> | >>>>>>>>>>>>>>>>>>>>>>>>> www.upsolver.com <http://www.upsolver.com/> >>>>>>>>>>>>>