I think it is great to explore alternatives, but I still feel we shouldn't 
deprecate equality deletes until we have a clear path forward.

> On Nov 19, 2024, at 7:56 AM, Russell Spitzer <russell.spit...@gmail.com> 
> wrote:
> 
> I'm strongly in favor of moving to the Delta + Base table approach discussed 
> in the cookbook above. I wonder if we should codify that into something more 
> standardized but it seems to me to be a much better path forward. I'm not 
> sure we need to support his at the spec level but it would be nice if we 
> could provide a table that automatically was broken into sub tables and had 
> well defined operations on it.
> 
> For example:
> 
> FastUpdateTable:
>    Requires: 
>      Primary Key Columns
>      Long Max Delta Size
>    Contains: 
>        Private Iceberg Table: Delta
>        Private Iceberg Table: Base
>        
>    On All Scans -
>        Return a view which joins delta and base on primary key, if Delta has 
> a record for a given primary key discard the base record
> 
>   On All Writes -
>        Perform all writes against the delta table, only MERGE is allowed. 
> Append is forbidden (No PK Guarantees) Only position deletes are allowed.
> 
>    On Delta Table Size Max Delta Size- -
>        Upsert DELTA into BASE
>        Clear upserted records from Delta
> 
> 
> If the Delta Table size is kept small I think this would be almost as 
> performant as Equality deletes but still be compatible with row-lineage and 
> other indexing features.
>    
> 
> On Tue, Nov 19, 2024 at 7:12 AM Manu Zhang <owenzhang1...@gmail.com 
> <mailto:owenzhang1...@gmail.com>> wrote:
>> Hi Ajantha,
>> 
>> I'm proposing exploring a view-based approach similar to the 
>> changelog-mirror table pattern[1] rather than supporting delta writers for 
>> Kafka connect Iceberg sink.
>> 
>> 1. 
>> https://www.tabular.io/apache-iceberg-cookbook/data-engineering-cdc-table-mirroring/
>> 
>> On Tue, Nov 19, 2024 at 7:38 PM Jean-Baptiste Onofré <j...@nanthrax.net 
>> <mailto:j...@nanthrax.net>> wrote:
>>> I don’t think it’s a problem while an alternative is explored (the JDK 
>>> itself does that pretty often). 
>>> So it’s up to the community: of course I’m against removing it without 
>>> solid alternative, but deprecation is fine imho. 
>>> 
>>> Regards
>>> JB
>>> 
>>> Le mar. 19 nov. 2024 à 12:19, Ajantha Bhat <ajanthab...@gmail.com 
>>> <mailto:ajanthab...@gmail.com>> a écrit :
>>>>> - ok for deprecate equality deletes
>>>>> - not ok to remove it
>>>>  
>>>> @JB: I don't think it is a good idea to use deprecated functionality in 
>>>> the new feature development. 
>>>> Hence, my specific question was about kafka connect upsert operation. 
>>>> 
>>>> @Manu: I meant the delta writers for kafka connect Iceberg sink (which in 
>>>> turn used for upsetting the CDC records)
>>>> https://github.com/apache/iceberg/issues/10842
>>>> 
>>>> 
>>>> - Ajantha
>>>> 
>>>> 
>>>> 
>>>> On Tue, Nov 19, 2024 at 3:08 PM Manu Zhang <owenzhang1...@gmail.com 
>>>> <mailto:owenzhang1...@gmail.com>> wrote:
>>>>> I second Anton's proposal to standardize on a view-based approach to 
>>>>> handle CDC cases.
>>>>> Actually, it's already been explored in detail[1] by Jack before.
>>>>> 
>>>>> [1] Improving Change Data Capture Use Case for Apache Iceberg 
>>>>> <https://docs.google.com/document/d/1kyyJp4masbd1FrIKUHF1ED_z1hTARL8bNoKCgb7fhSQ/edit?tab=t.0#heading=h.94xnx4qg3bnt>
>>>>> 
>>>>> 
>>>>> On Tue, Nov 19, 2024 at 4:16 PM Jean-Baptiste Onofré <j...@nanthrax.net 
>>>>> <mailto:j...@nanthrax.net>> wrote:
>>>>>> My proposal is the following (already expressed):
>>>>>> - ok for deprecate equality deletes
>>>>>> - not ok to remove it
>>>>>> - work on position deletes improvements to address streaming use cases. 
>>>>>> I think we should explore different approaches. Personally I think a 
>>>>>> possible approach would be to find index way to data files to avoid full 
>>>>>> scan to find row position. 
>>>>>> 
>>>>>> My $0.01 :)
>>>>>> 
>>>>>> Regards
>>>>>> JB
>>>>>> 
>>>>>> Le mar. 19 nov. 2024 à 07:53, Ajantha Bhat <ajanthab...@gmail.com 
>>>>>> <mailto:ajanthab...@gmail.com>> a écrit :
>>>>>>> Hi, What's the conclusion on this thread? 
>>>>>>> 
>>>>>>> Users are looking for Upsert (CDC) support for OSS Iceberg kafka 
>>>>>>> connect sink. 
>>>>>>> We only support appends at the moment. Can we go ahead and implement 
>>>>>>> the upserts using equality deletes? 
>>>>>>> 
>>>>>>> 
>>>>>>> - Ajantha
>>>>>>> 
>>>>>>> On Sun, Nov 10, 2024 at 11:56 AM Vignesh <vignesh.v...@gmail.com 
>>>>>>> <mailto:vignesh.v...@gmail.com>> wrote:
>>>>>>>> Hi,
>>>>>>>> I am reading about iceberg and am quite new to this.
>>>>>>>> This puffin would be an index from key to data file. Other use cases 
>>>>>>>> of Puffin, such as statistics are at a per file level if I understand 
>>>>>>>> correctly.
>>>>>>>> 
>>>>>>>> Where would the puffin about key->data file be stored? It is a 
>>>>>>>> property of the entire table.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Vignesh.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Sat, Nov 9, 2024 at 2:17 AM Shani Elharrar 
>>>>>>>> <sh...@upsolver.com.invalid> wrote:
>>>>>>>>> JB, this is what we do, we write Equality Deletes and periodically 
>>>>>>>>> convert them to Positional Deletes. 
>>>>>>>>> 
>>>>>>>>> We could probably index the keys, maybe partially index using bloom 
>>>>>>>>> filters, the best would be to put those bloom filters inside puffin. 
>>>>>>>>> 
>>>>>>>>> Shani.
>>>>>>>>> 
>>>>>>>>>> On 9 Nov 2024, at 11:11, Jean-Baptiste Onofré <j...@nanthrax.net 
>>>>>>>>>> <mailto:j...@nanthrax.net>> wrote:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> I agree with Peter here, and I would say that it would be an issue 
>>>>>>>>>> for multi-engine support.
>>>>>>>>>> 
>>>>>>>>>> I think, as I already mentioned with others, we should explore an 
>>>>>>>>>> alternative.
>>>>>>>>>> As the main issue is the datafile scan in streaming context, maybe 
>>>>>>>>>> we could find a way to "index"/correlate for positional deletes with 
>>>>>>>>>> limited scanning.
>>>>>>>>>> I will think again about that :) 
>>>>>>>>>> 
>>>>>>>>>> Regards
>>>>>>>>>> JB
>>>>>>>>>> 
>>>>>>>>>> On Sat, Nov 9, 2024 at 6:48 AM Péter Váry 
>>>>>>>>>> <peter.vary.apa...@gmail.com <mailto:peter.vary.apa...@gmail.com>> 
>>>>>>>>>> wrote:
>>>>>>>>>>> Hi Imran, 
>>>>>>>>>>> 
>>>>>>>>>>> I don't think it's a good idea to start creating multiple types of 
>>>>>>>>>>> Iceberg tables. Iceberg's main selling point is compatibility 
>>>>>>>>>>> between engines. If we don't have readers and writers for all types 
>>>>>>>>>>> of tables, then we remove compatibility from the equation and 
>>>>>>>>>>> engine specific formats always win. OTOH, if we write readers and 
>>>>>>>>>>> writers for all types of tables then we are back on square one.
>>>>>>>>>>> 
>>>>>>>>>>> Identifier fields are a table schema concept and used in many cases 
>>>>>>>>>>> during query planning and execution. This is why they are defined 
>>>>>>>>>>> as part of the SQL spec, and this is why Iceberg defines them as 
>>>>>>>>>>> well. One use case is where they can be used to merge deletes 
>>>>>>>>>>> (independently of how they are manifested) and subsequent inserts, 
>>>>>>>>>>> into updates.
>>>>>>>>>>> 
>>>>>>>>>>> Flink SQL doesn't allow creating tables with partition transforms, 
>>>>>>>>>>> so no new table could be created by Flink SQL using transforms, but 
>>>>>>>>>>> tables created by other engines could still be used (both read an 
>>>>>>>>>>> write). Also you can create such tables in Flink using the Java API.
>>>>>>>>>>> 
>>>>>>>>>>> Requiring partition columns be part of the identifier fields is 
>>>>>>>>>>> coming from the practical consideration, that you want to limit the 
>>>>>>>>>>> scope of the equality deletes as much as possible. Otherwise all of 
>>>>>>>>>>> the equality deletes should be table global, and they should be 
>>>>>>>>>>> read by every reader. We could write those, we just decided that we 
>>>>>>>>>>> don't want to allow the user to do this, as it is most cases a bad 
>>>>>>>>>>> idea.
>>>>>>>>>>> 
>>>>>>>>>>> I hope this helps,
>>>>>>>>>>> Peter
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, Nov 8, 2024, 22:01 Imran Rashid 
>>>>>>>>>>> <iras...@cloudera.com.invalid> wrote:
>>>>>>>>>>>> I'm not down in the weeds at all myself on implementation details, 
>>>>>>>>>>>> so forgive me if I'm wrong about the details here.
>>>>>>>>>>>> 
>>>>>>>>>>>> I can see all the viewpoints -- both that equality deletes enable 
>>>>>>>>>>>> some use cases, but also make others far more difficult.  What 
>>>>>>>>>>>> surprised me the most is that Iceberg does not provide a way to 
>>>>>>>>>>>> distinguish these two table "types".
>>>>>>>>>>>> 
>>>>>>>>>>>> At first, I thought the presence of an identifier-field 
>>>>>>>>>>>> (https://iceberg.apache.org/spec/#identifier-field-ids) indicated 
>>>>>>>>>>>> that the table was a target for equality deletes.  But, then it 
>>>>>>>>>>>> turns out identifier-fields are also useful for changelog views 
>>>>>>>>>>>> even without equality deletes -- IIUC, they show that a delete + 
>>>>>>>>>>>> insert should actually be interpreted as an update in changelog 
>>>>>>>>>>>> view.
>>>>>>>>>>>> 
>>>>>>>>>>>> To be perfectly honest, I'm confused about all of these details -- 
>>>>>>>>>>>> from my read, the spec does not indicate this relationship between 
>>>>>>>>>>>> identifier-fields and equality_ids in equality delete files 
>>>>>>>>>>>> (https://iceberg.apache.org/spec/#equality-delete-files), but I 
>>>>>>>>>>>> think that is the way Flink works.  Flink itself seems to have 
>>>>>>>>>>>> even more limitations -- no partition transforms are allowed, and 
>>>>>>>>>>>> all partition columns must be a subset of the identifier fields.  
>>>>>>>>>>>> Is that just a Flink limitation, or is that the intended behavior 
>>>>>>>>>>>> in the spec?  (Or maybe user-error on my part?)  Those seem like 
>>>>>>>>>>>> very reasonable limitations, from an implementation point-of-view. 
>>>>>>>>>>>>  But OTOH, as a user, this seems to be directly contrary to some 
>>>>>>>>>>>> of the promises of Iceberg.
>>>>>>>>>>>> 
>>>>>>>>>>>> Its easy to see if a table already has equality deletes in it, by 
>>>>>>>>>>>> looking at the metadata.  But is there any way to indicate that a 
>>>>>>>>>>>> table (or branch of a table) _must not_ have equality deletes 
>>>>>>>>>>>> added to it?
>>>>>>>>>>>> 
>>>>>>>>>>>> If that were possible, it seems like we could support both use 
>>>>>>>>>>>> cases.  We could continue to optimize for the streaming ingestion 
>>>>>>>>>>>> use cases using equality deletes.  But we could also build more 
>>>>>>>>>>>> optimizations into the "non-streaming-ingestion" branches.  And we 
>>>>>>>>>>>> could document the tradeoff so it is much clearer to end users.
>>>>>>>>>>>> 
>>>>>>>>>>>> To maintain compatibility, I suppose that the change would be that 
>>>>>>>>>>>> equality deletes continue to be allowed by default, but we'd add a 
>>>>>>>>>>>> new field to indicate that for some tables (or branches of a 
>>>>>>>>>>>> table), equality deletes would not be allowed.  And it would be an 
>>>>>>>>>>>> error for an engine to make an update which added an equality 
>>>>>>>>>>>> delete to such a table.
>>>>>>>>>>>> 
>>>>>>>>>>>> Maybe that change would even be possible in V3.
>>>>>>>>>>>> 
>>>>>>>>>>>> And if all the performance improvements to equality deletes make 
>>>>>>>>>>>> this a moot point, we could drop the field in v4.  But it seems 
>>>>>>>>>>>> like a mistake to both limit the non-streaming use-case AND have 
>>>>>>>>>>>> confusing limitations for the end-user in the meantime.
>>>>>>>>>>>> 
>>>>>>>>>>>> I would happily be corrected about my understanding of all of the 
>>>>>>>>>>>> above.
>>>>>>>>>>>> 
>>>>>>>>>>>> thanks!
>>>>>>>>>>>> Imran
>>>>>>>>>>>> 
>>>>>>>>>>>> On Tue, Nov 5, 2024 at 9:16 AM Bryan Keller <brya...@gmail.com 
>>>>>>>>>>>> <mailto:brya...@gmail.com>> wrote:
>>>>>>>>>>>>> I also feel we should keep equality deletes until we have an 
>>>>>>>>>>>>> alternative solution for streaming updates/deletes.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -Bryan
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Nov 4, 2024, at 8:33 AM, Péter Váry 
>>>>>>>>>>>>>> <peter.vary.apa...@gmail.com 
>>>>>>>>>>>>>> <mailto:peter.vary.apa...@gmail.com>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Well, it seems like I'm a little late, so most of the arguments 
>>>>>>>>>>>>>> are voiced.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I agree that we should not deprecate the equality deletes until 
>>>>>>>>>>>>>> we have a replacement feature.
>>>>>>>>>>>>>> I think one of the big advantages of Iceberg is that it supports 
>>>>>>>>>>>>>> batch processing and streaming ingestion too.
>>>>>>>>>>>>>> For streaming ingestion we need a way to update existing data in 
>>>>>>>>>>>>>> a performant way, but restricting deletes for the primary keys 
>>>>>>>>>>>>>> seems like enough from the streaming perspective.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Equality deletes allow a very wide range of applications, which 
>>>>>>>>>>>>>> we might be able to narrow down a bit, but still keep useful. So 
>>>>>>>>>>>>>> if we want to go down this road, we need to start collecting the 
>>>>>>>>>>>>>> requirements.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Peter
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Shani Elharrar <sh...@upsolver.com.invalid> ezt írta (időpont: 
>>>>>>>>>>>>>> 2024. nov. 1., P, 19:22):
>>>>>>>>>>>>>>> I understand how it makes sense for batch jobs, but it damages 
>>>>>>>>>>>>>>> stream jobs, using equality deletes works much better for 
>>>>>>>>>>>>>>> streaming (which have a strict SLA for delays), and in order to 
>>>>>>>>>>>>>>> decrease the performance penalty - systems can rewrite the 
>>>>>>>>>>>>>>> equality deletes to positional deletes. 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Shani.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On 1 Nov 2024, at 20:06, Steven Wu <stevenz...@gmail.com 
>>>>>>>>>>>>>>>> <mailto:stevenz...@gmail.com>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Fundamentally, it is very difficult to write position deletes 
>>>>>>>>>>>>>>>> with concurrent writers and conflicts for batch jobs too, as 
>>>>>>>>>>>>>>>> the inverted index may become invalid/stale. 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The position deletes are created during the write phase. But 
>>>>>>>>>>>>>>>> conflicts are only detected at the commit stage. I assume the 
>>>>>>>>>>>>>>>> batch job should fail in this case.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Fri, Nov 1, 2024 at 10:57 AM Steven Wu 
>>>>>>>>>>>>>>>> <stevenz...@gmail.com <mailto:stevenz...@gmail.com>> wrote:
>>>>>>>>>>>>>>>>> Shani,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> That is a good point. It is certainly a limitation for the 
>>>>>>>>>>>>>>>>> Flink job to track the inverted index internally (which is 
>>>>>>>>>>>>>>>>> what I had in mind). It can't be shared/synchronized with 
>>>>>>>>>>>>>>>>> other Flink jobs or other engines writing to the same table.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Steven
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Fri, Nov 1, 2024 at 10:50 AM Shani Elharrar 
>>>>>>>>>>>>>>>>> <sh...@upsolver.com.invalid> wrote:
>>>>>>>>>>>>>>>>>> Even if Flink can create this state, it would have to be 
>>>>>>>>>>>>>>>>>> maintained against the Iceberg table, we wouldn't like 
>>>>>>>>>>>>>>>>>> duplicates (keys) if other systems / users update the table 
>>>>>>>>>>>>>>>>>> (e.g manual insert / updates using DML). 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Shani.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On 1 Nov 2024, at 18:32, Steven Wu <stevenz...@gmail.com 
>>>>>>>>>>>>>>>>>>> <mailto:stevenz...@gmail.com>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> > Add support for inverted indexes to reduce the cost of 
>>>>>>>>>>>>>>>>>>> > position lookup. This is fairly tricky to implement for 
>>>>>>>>>>>>>>>>>>> > streaming use cases without an external system.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Anton, that is also what I was saying earlier. In Flink, 
>>>>>>>>>>>>>>>>>>> the inverted index of (key, committed data files) can be 
>>>>>>>>>>>>>>>>>>> tracked in Flink state.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Fri, Nov 1, 2024 at 2:16 AM Anton Okolnychyi 
>>>>>>>>>>>>>>>>>>> <aokolnyc...@gmail.com <mailto:aokolnyc...@gmail.com>> 
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> I was a bit skeptical when we were adding equality 
>>>>>>>>>>>>>>>>>>>> deletes, but nothing beats their performance during 
>>>>>>>>>>>>>>>>>>>> writes. We have to find an alternative before deprecating.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> We are doing a lot of work to improve streaming, like 
>>>>>>>>>>>>>>>>>>>> reducing the cost of commits, enabling a large 
>>>>>>>>>>>>>>>>>>>> (potentially infinite) number of snapshots, changelog 
>>>>>>>>>>>>>>>>>>>> reads, and so on. It is a project goal to excel in 
>>>>>>>>>>>>>>>>>>>> streaming.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I was going to focus on equality deletes after completing 
>>>>>>>>>>>>>>>>>>>> the DV work. I believe we have these options:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> - Revisit the existing design of equality deletes (e.g. 
>>>>>>>>>>>>>>>>>>>> add more restrictions, improve compaction, offer new 
>>>>>>>>>>>>>>>>>>>> writers).
>>>>>>>>>>>>>>>>>>>> - Standardize on the view-based approach [1] to handle 
>>>>>>>>>>>>>>>>>>>> streaming upserts and CDC use cases, potentially making 
>>>>>>>>>>>>>>>>>>>> this part of the spec.
>>>>>>>>>>>>>>>>>>>> - Add support for inverted indexes to reduce the cost of 
>>>>>>>>>>>>>>>>>>>> position lookup. This is fairly tricky to implement for 
>>>>>>>>>>>>>>>>>>>> streaming use cases without an external system. Our 
>>>>>>>>>>>>>>>>>>>> runtime filtering in Spark today is equivalent to looking 
>>>>>>>>>>>>>>>>>>>> up positions in an inverted index represented by another 
>>>>>>>>>>>>>>>>>>>> Iceberg table. That may still not be enough for some 
>>>>>>>>>>>>>>>>>>>> streaming use cases.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> [1] - https://www.tabular.io/blog/hello-world-of-cdc/
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> - Anton
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> чт, 31 жовт. 2024 р. о 21:31 Micah Kornfield 
>>>>>>>>>>>>>>>>>>>> <emkornfi...@gmail.com <mailto:emkornfi...@gmail.com>> 
>>>>>>>>>>>>>>>>>>>> пише:
>>>>>>>>>>>>>>>>>>>>> I agree that equality deletes have their place in 
>>>>>>>>>>>>>>>>>>>>> streaming.  I think the ultimate decision here is how 
>>>>>>>>>>>>>>>>>>>>> opinionated Iceberg wants to be on its use-cases.  If it 
>>>>>>>>>>>>>>>>>>>>> really wants to stick to its origins of "slow moving 
>>>>>>>>>>>>>>>>>>>>> data", then removing equality deletes would be inline 
>>>>>>>>>>>>>>>>>>>>> with this.  I think the other high level question is how 
>>>>>>>>>>>>>>>>>>>>> much we allow for partially compatible features (the row 
>>>>>>>>>>>>>>>>>>>>> lineage use-case feature was explicitly approved 
>>>>>>>>>>>>>>>>>>>>> excluding equality deletes, and people seemed OK with it 
>>>>>>>>>>>>>>>>>>>>> at the time.  If all features need to work together, then 
>>>>>>>>>>>>>>>>>>>>> maybe we need to rethink the design here so it can be 
>>>>>>>>>>>>>>>>>>>>> forward compatible with equality deletes).
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I think one issue with equality deletes as stated in the 
>>>>>>>>>>>>>>>>>>>>> spec is that they are overly broad.  I'd be interested if 
>>>>>>>>>>>>>>>>>>>>> people have any use cases that differ, but I think one 
>>>>>>>>>>>>>>>>>>>>> way of narrowing (and probably a necessary building block 
>>>>>>>>>>>>>>>>>>>>> for building something better)  the specification scope 
>>>>>>>>>>>>>>>>>>>>> on equality deletes is to focus on upsert/Streaming 
>>>>>>>>>>>>>>>>>>>>> deletes.  Two proposals in this regard are:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 1.  Require that equality deletes can only correspond to 
>>>>>>>>>>>>>>>>>>>>> unique identifiers for the table.
>>>>>>>>>>>>>>>>>>>>> 2.  Consider requiring that for equality deletes on 
>>>>>>>>>>>>>>>>>>>>> partitioned tables, that the primary key must contain a 
>>>>>>>>>>>>>>>>>>>>> partition column (I believe Flink at least already does 
>>>>>>>>>>>>>>>>>>>>> this).  It is less clear to me that this would meet all 
>>>>>>>>>>>>>>>>>>>>> existing use-cases.  But having this would allow for 
>>>>>>>>>>>>>>>>>>>>> better incremental data-structures, which could then be 
>>>>>>>>>>>>>>>>>>>>> partition based.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Narrow scope to unique identifiers would allow for 
>>>>>>>>>>>>>>>>>>>>> further building blocks already mentioned, like a 
>>>>>>>>>>>>>>>>>>>>> secondary index (possible via LSM tree), that would allow 
>>>>>>>>>>>>>>>>>>>>> for better performance overall.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I generally agree with the sentiment that we shouldn't 
>>>>>>>>>>>>>>>>>>>>> deprecate them until there is a viable replacement.  With 
>>>>>>>>>>>>>>>>>>>>> all due respect to my employer, let's not fall into the 
>>>>>>>>>>>>>>>>>>>>> Google trap [1] :) 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>> Micah
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> [1] https://goomics.net/50/
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 12:35 PM Alexander Jo 
>>>>>>>>>>>>>>>>>>>>> <alex...@starburstdata.com 
>>>>>>>>>>>>>>>>>>>>> <mailto:alex...@starburstdata.com>> wrote:
>>>>>>>>>>>>>>>>>>>>>> Hey all,
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Just to throw my 2 cents in, I agree with Steven and 
>>>>>>>>>>>>>>>>>>>>>> others that we do need some kind of replacement before 
>>>>>>>>>>>>>>>>>>>>>> deprecating equality deletes.
>>>>>>>>>>>>>>>>>>>>>> They certainly have their problems, and do significantly 
>>>>>>>>>>>>>>>>>>>>>> increase complexity as they are now, but the writing of 
>>>>>>>>>>>>>>>>>>>>>> position deletes is too expensive for certain pipelines.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> We've been investigating using equality deletes for some 
>>>>>>>>>>>>>>>>>>>>>> of our workloads at Starburst, the key advantage we were 
>>>>>>>>>>>>>>>>>>>>>> hoping to leverage is cheap, effectively random access 
>>>>>>>>>>>>>>>>>>>>>> lookup deletes.
>>>>>>>>>>>>>>>>>>>>>> Say you have a UUID column that's unique in a table and 
>>>>>>>>>>>>>>>>>>>>>> want to delete a row by UUID. With position deletes each 
>>>>>>>>>>>>>>>>>>>>>> delete is expensive without an index on that UUID. 
>>>>>>>>>>>>>>>>>>>>>> With equality deletes each delete is cheap and while 
>>>>>>>>>>>>>>>>>>>>>> reads/compaction is expensive but when updates are 
>>>>>>>>>>>>>>>>>>>>>> frequent and reads are sporadic that's a reasonable 
>>>>>>>>>>>>>>>>>>>>>> tradeoff.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Pretty much what Jason and Steven have already said. 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Maybe there are some incremental improvements on 
>>>>>>>>>>>>>>>>>>>>>> equality deletes or tips from similar systems that might 
>>>>>>>>>>>>>>>>>>>>>> alleviate some of their problems?
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> - Alex Jo
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 10:58 AM Steven Wu 
>>>>>>>>>>>>>>>>>>>>>> <stevenz...@gmail.com <mailto:stevenz...@gmail.com>> 
>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> We probably all agree with the downside of equality 
>>>>>>>>>>>>>>>>>>>>>>> deletes: it postpones all the work on the read path.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> In theory, we can implement position deletes only in 
>>>>>>>>>>>>>>>>>>>>>>> the Flink streaming writer. It would require the 
>>>>>>>>>>>>>>>>>>>>>>> tracking of last committed data files per key, which 
>>>>>>>>>>>>>>>>>>>>>>> can be stored in Flink state (checkpointed). This is 
>>>>>>>>>>>>>>>>>>>>>>> obviously quite expensive/challenging, but possible. 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> I like to echo one benefit of equality deletes that 
>>>>>>>>>>>>>>>>>>>>>>> Russel called out in the original email. Equality 
>>>>>>>>>>>>>>>>>>>>>>> deletes would never have conflicts. that is important 
>>>>>>>>>>>>>>>>>>>>>>> for streaming writers (Flink, Kafka connect, ...) that 
>>>>>>>>>>>>>>>>>>>>>>> commit frequently (minutes or less). Assume Flink can 
>>>>>>>>>>>>>>>>>>>>>>> write position deletes only and commit every 2 minutes. 
>>>>>>>>>>>>>>>>>>>>>>> The long-running nature of streaming jobs can cause 
>>>>>>>>>>>>>>>>>>>>>>> frequent commit conflicts with background delete 
>>>>>>>>>>>>>>>>>>>>>>> compaction jobs.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Overall, the streaming upsert write is not a well 
>>>>>>>>>>>>>>>>>>>>>>> solved problem in Iceberg. This probably affects all 
>>>>>>>>>>>>>>>>>>>>>>> streaming engines (Flink, Kafka connect, Spark 
>>>>>>>>>>>>>>>>>>>>>>> streaming, ...). We need to come up with some better 
>>>>>>>>>>>>>>>>>>>>>>> alternatives before we can deprecate equality deletes.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 8:38 AM Russell Spitzer 
>>>>>>>>>>>>>>>>>>>>>>> <russell.spit...@gmail.com 
>>>>>>>>>>>>>>>>>>>>>>> <mailto:russell.spit...@gmail.com>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> For users of Equality Deletes, what are the key 
>>>>>>>>>>>>>>>>>>>>>>>> benefits to Equality Deletes that you would like to 
>>>>>>>>>>>>>>>>>>>>>>>> preserve and could you please share some concrete 
>>>>>>>>>>>>>>>>>>>>>>>> examples of the queries you want to run (and the 
>>>>>>>>>>>>>>>>>>>>>>>> schemas and data sizes you would like to run them 
>>>>>>>>>>>>>>>>>>>>>>>> against) and the latencies that would be acceptable?
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 10:05 AM Jason Fine 
>>>>>>>>>>>>>>>>>>>>>>>> <ja...@upsolver.com.invalid> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> Hi, 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Representing Upsolver here, we also make use of 
>>>>>>>>>>>>>>>>>>>>>>>>> Equality Deletes to deliver high frequency low 
>>>>>>>>>>>>>>>>>>>>>>>>> latency updates to our clients at scale. We have 
>>>>>>>>>>>>>>>>>>>>>>>>> customers using them at scale and demonstrating the 
>>>>>>>>>>>>>>>>>>>>>>>>> need and viability. We automate the process of 
>>>>>>>>>>>>>>>>>>>>>>>>> converting them into positional deletes (or fully 
>>>>>>>>>>>>>>>>>>>>>>>>> applying them) for more efficient engine queries in 
>>>>>>>>>>>>>>>>>>>>>>>>> the background giving our users both low latency and 
>>>>>>>>>>>>>>>>>>>>>>>>> good query performance. 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Equality Deletes were added since there isn't a good 
>>>>>>>>>>>>>>>>>>>>>>>>> way to solve frequent updates otherwise. It would 
>>>>>>>>>>>>>>>>>>>>>>>>> require some sort of index keeping track of every 
>>>>>>>>>>>>>>>>>>>>>>>>> record in the table (by a predetermined PK) and 
>>>>>>>>>>>>>>>>>>>>>>>>> maintaining such an index is a huge task that every 
>>>>>>>>>>>>>>>>>>>>>>>>> tool interested in this would need to re-implement. 
>>>>>>>>>>>>>>>>>>>>>>>>> It also becomes a bottleneck limiting table sizes.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> I don't think they should be removed without 
>>>>>>>>>>>>>>>>>>>>>>>>> providing an alternative. Positional Deletes have a 
>>>>>>>>>>>>>>>>>>>>>>>>> different performance profile inherently, requiring 
>>>>>>>>>>>>>>>>>>>>>>>>> more upfront work proportional to the table size. 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 31, 2024 at 2:45 PM Jean-Baptiste Onofré 
>>>>>>>>>>>>>>>>>>>>>>>>> <j...@nanthrax.net <mailto:j...@nanthrax.net>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Russell
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the nice writeup and the proposal.
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> I agree with your analysis, and I have the same 
>>>>>>>>>>>>>>>>>>>>>>>>>> feeling. However, I
>>>>>>>>>>>>>>>>>>>>>>>>>> think there are more than Flink that write equality 
>>>>>>>>>>>>>>>>>>>>>>>>>> delete files. So,
>>>>>>>>>>>>>>>>>>>>>>>>>> I agree to deprecate in V3, but maybe be more 
>>>>>>>>>>>>>>>>>>>>>>>>>> "flexible" about removal
>>>>>>>>>>>>>>>>>>>>>>>>>> in V4 in order to give time to engines to update.
>>>>>>>>>>>>>>>>>>>>>>>>>> I think that by deprecating equality deletes, we are 
>>>>>>>>>>>>>>>>>>>>>>>>>> clearly focusing
>>>>>>>>>>>>>>>>>>>>>>>>>> on read performance and "consistency" (more than 
>>>>>>>>>>>>>>>>>>>>>>>>>> write). It's not
>>>>>>>>>>>>>>>>>>>>>>>>>> necessarily a bad thing but the streaming platform 
>>>>>>>>>>>>>>>>>>>>>>>>>> and data ingestion
>>>>>>>>>>>>>>>>>>>>>>>>>> platforms will be probably concerned about that (by 
>>>>>>>>>>>>>>>>>>>>>>>>>> using positional
>>>>>>>>>>>>>>>>>>>>>>>>>> deletes, they will have to scan/read all datafiles 
>>>>>>>>>>>>>>>>>>>>>>>>>> to find the
>>>>>>>>>>>>>>>>>>>>>>>>>> position, so painful).
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> So, to summarize:
>>>>>>>>>>>>>>>>>>>>>>>>>> 1. Agree to deprecate equality deletes, but -1 to 
>>>>>>>>>>>>>>>>>>>>>>>>>> commit any target
>>>>>>>>>>>>>>>>>>>>>>>>>> for deletion before having a clear path for 
>>>>>>>>>>>>>>>>>>>>>>>>>> streaming platforms
>>>>>>>>>>>>>>>>>>>>>>>>>> (Flink, Beam, ...)
>>>>>>>>>>>>>>>>>>>>>>>>>> 2. In the meantime (during the deprecation period), 
>>>>>>>>>>>>>>>>>>>>>>>>>> I propose to
>>>>>>>>>>>>>>>>>>>>>>>>>> explore possible improvements for streaming 
>>>>>>>>>>>>>>>>>>>>>>>>>> platforms (maybe finding a
>>>>>>>>>>>>>>>>>>>>>>>>>> way to avoid full data files scan, ...)
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks !
>>>>>>>>>>>>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>>>>>>>>>>>>> JB
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 30, 2024 at 10:06 PM Russell Spitzer
>>>>>>>>>>>>>>>>>>>>>>>>>> <russell.spit...@gmail.com 
>>>>>>>>>>>>>>>>>>>>>>>>>> <mailto:russell.spit...@gmail.com>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>> > Background:
>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>> > 1) Position Deletes
>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>> > Writers determine what rows are deleted and mark 
>>>>>>>>>>>>>>>>>>>>>>>>>> > them in a 1 for 1 representation. With delete 
>>>>>>>>>>>>>>>>>>>>>>>>>> > vectors this means every data file has at most 1 
>>>>>>>>>>>>>>>>>>>>>>>>>> > delete vector that it is read in conjunction with 
>>>>>>>>>>>>>>>>>>>>>>>>>> > to excise deleted rows. Reader overhead is more or 
>>>>>>>>>>>>>>>>>>>>>>>>>> > less constant and is very predictable.
>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>> > The main cost of this mode is that deletes must be 
>>>>>>>>>>>>>>>>>>>>>>>>>> > determined at write time which is expensive and 
>>>>>>>>>>>>>>>>>>>>>>>>>> > can be more difficult for conflict resolution
>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>> > 2) Equality Deletes
>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>> > Writers write out reference to what values are 
>>>>>>>>>>>>>>>>>>>>>>>>>> > deleted (in a partition or globally). There can be 
>>>>>>>>>>>>>>>>>>>>>>>>>> > an unlimited number of equality deletes and they 
>>>>>>>>>>>>>>>>>>>>>>>>>> > all must be checked for every data file that is 
>>>>>>>>>>>>>>>>>>>>>>>>>> > read. The cost of determining deleted rows is 
>>>>>>>>>>>>>>>>>>>>>>>>>> > essentially given to the reader.
>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>> > Conflicts almost never happen since data files are 
>>>>>>>>>>>>>>>>>>>>>>>>>> > not actually changed and there is almost no cost 
>>>>>>>>>>>>>>>>>>>>>>>>>> > to the writer to generate these. Almost all costs 
>>>>>>>>>>>>>>>>>>>>>>>>>> > related to equality deletes are passed on to the 
>>>>>>>>>>>>>>>>>>>>>>>>>> > reader.
>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>> > Proposal:
>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>> > Equality deletes are, in my opinion, unsustainable 
>>>>>>>>>>>>>>>>>>>>>>>>>> > and we should work on deprecating and removing 
>>>>>>>>>>>>>>>>>>>>>>>>>> > them from the specification. At this time, I know 
>>>>>>>>>>>>>>>>>>>>>>>>>> > of only one engine (Apache Flink) which produces 
>>>>>>>>>>>>>>>>>>>>>>>>>> > these deletes but almost all engines have 
>>>>>>>>>>>>>>>>>>>>>>>>>> > implementations to read them. The cost of 
>>>>>>>>>>>>>>>>>>>>>>>>>> > implementing equality deletes on the read path is 
>>>>>>>>>>>>>>>>>>>>>>>>>> > difficult and unpredictable in terms of memory 
>>>>>>>>>>>>>>>>>>>>>>>>>> > usage and compute complexity. We’ve had 
>>>>>>>>>>>>>>>>>>>>>>>>>> > suggestions of implementing rocksdb inorder to 
>>>>>>>>>>>>>>>>>>>>>>>>>> > handle ever growing sets of equality deletes which 
>>>>>>>>>>>>>>>>>>>>>>>>>> > in my opinion shows that we are going down the 
>>>>>>>>>>>>>>>>>>>>>>>>>> > wrong path.
>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>> > Outside of performance, Equality deletes are also 
>>>>>>>>>>>>>>>>>>>>>>>>>> > difficult to use in conjunction with many other 
>>>>>>>>>>>>>>>>>>>>>>>>>> > features. For example, any features requiring CDC 
>>>>>>>>>>>>>>>>>>>>>>>>>> > or Row lineage are basically impossible when 
>>>>>>>>>>>>>>>>>>>>>>>>>> > equality deletes are in use. When Equality deletes 
>>>>>>>>>>>>>>>>>>>>>>>>>> > are present, the state of the table can only be 
>>>>>>>>>>>>>>>>>>>>>>>>>> > determined with a full scan making it difficult to 
>>>>>>>>>>>>>>>>>>>>>>>>>> > update differential structures. This means 
>>>>>>>>>>>>>>>>>>>>>>>>>> > materialized views or indexes need to essentially 
>>>>>>>>>>>>>>>>>>>>>>>>>> > be fully rebuilt whenever an equality delete is 
>>>>>>>>>>>>>>>>>>>>>>>>>> > added to the table.
>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>> > Equality deletes essentially remove complexity 
>>>>>>>>>>>>>>>>>>>>>>>>>> > from the write side but then add what I believe is 
>>>>>>>>>>>>>>>>>>>>>>>>>> > an unacceptable level of complexity to the read 
>>>>>>>>>>>>>>>>>>>>>>>>>> > side.
>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>> > Because of this I suggest we deprecate Equality 
>>>>>>>>>>>>>>>>>>>>>>>>>> > Deletes in V3 and slate them for full removal from 
>>>>>>>>>>>>>>>>>>>>>>>>>> > the Iceberg Spec in V4.
>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>> > I know this is a big change and compatibility 
>>>>>>>>>>>>>>>>>>>>>>>>>> > breakage so I would like to introduce this idea to 
>>>>>>>>>>>>>>>>>>>>>>>>>> > the community and solicit feedback from all 
>>>>>>>>>>>>>>>>>>>>>>>>>> > stakeholders. I am very flexible on this issue and 
>>>>>>>>>>>>>>>>>>>>>>>>>> > would like to hear the best issues both for and 
>>>>>>>>>>>>>>>>>>>>>>>>>> > against removal of Equality Deletes.
>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>> > Thanks everyone for your time,
>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>> > Russ Spitzer
>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Jason Fine
>>>>>>>>>>>>>>>>>>>>>>>>> Chief Software Architect
>>>>>>>>>>>>>>>>>>>>>>>>> ja...@upsolver.com <mailto:ja...@upsolver.com>  | 
>>>>>>>>>>>>>>>>>>>>>>>>> www.upsolver.com <http://www.upsolver.com/>
>>>>>>>>>>>>> 

Reply via email to