Great idea! I've replaced all "no update" with "immutable" in the FLIP. Since the title has been updated, here is the new link[1] to this FLIP. Additionally, if there are no further comments, I will initiate the voting later tomorrow.
[1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-566%3A+Introduce+a+new+IMMUTABLE+columns+constraint -- Best! Xuyang At 2026-03-03 01:01:17, "Gustavo de Morais" <[email protected]> wrote: >Hey Xuyang, > >Thanks for the reply and the discussion. I understand now your primary goal >with the FLIP. If we want to enable these optimizations, PK-lifetime >immutability makes sense! > >I'm fine with keeping the deletion restriction for v1. Suggestion: what if >we called it IMMUTABLE instead of NO UPDATE? It's shorter and naturally >implies more a PK-lifetime semantics rather than just no UPDATE operations. >If you rather go with NO UPDATE, we could introduce a separate NO UPDATE >ALLOW DELETES variant later if there's demand. > >Kind regards, >Gustavo > > > >On Sat, 28 Feb 2026 at 04:48, Xuyang <[email protected]> wrote: > >> Hi, Gustavo. I’d be glad to explore a more relaxed desigh together with >> you! Let me share my thoughts. >> >> >> When designing this Flip, my original intention for the "no update" column >> semantics was: as long as the primary key remains the same, the values in >> these columns must not change. This allows us to safely treat the "no >> update" columns as part of the source's unique (upsert) key, without >> needing to track or interpret the row kind of intermediate results. The >> reason is that all intermediate data ultimately originates from the source, >> unless an explicit key transformation occurs(e.g. agg with grouping key). >> >> >> I did not prohibit -U (update-before) records because, in most storage >> systems, an update is treated as a single atomic operation; Flink merely >> decomposes it into -U and +U for internal processing. Storage systems can >> easily enforce the "no update" constraint by checking whether a single >> update violates immutability for certain columns. >> >> >> However, if we instead restrict the immutability guarantee only between +I >> and -D, due to upstream operators often emit multiple +I/-D record pairs, >> we will lose the ability to leverage "no update" column information for >> optimizations. Consider the following example: >> ``` >> create table src1(a int, b int, c int, primary key(a) not enforced, >> column(c) no update not enforced); >> create table src2(id int, d int, primary key(id) not enforced); >> select * from src1 join src2 on c = id; >> ``` >> This join prevents the upstream from generating DropUpdateBefore because, >> despite being declared as immutable, column `c` can actually take different >> values for the same primary key `a` across different +I and -D events. As a >> result, the join key `c` is actually not stable, causing records to be >> shuffled to different parallel task. Consequently, the system must rely on >> -U to correctly process the data. >> >> >> >> >> >> -- >> >> Best! >> Xuyang >> >> >> >> At 2026-02-27 21:35:08, "Gustavo de Morais" <[email protected]> >> wrote: >> >Hey Xuyang, >> > >> >Thanks for the reply. >> > >> >- Could you give an example for "which could maybe lead to the no-update >> >metadata being incorrectly applied in some certain scenario"? >> >- Also, if we're banning -D because joins can't distinguish it from -U, >> >then by the same logic we'd need to ban -U, right? Could you specify that >> >in the FLIP? >> > >> >Just to clarify, I'm +1 for the FLIP, I'm just wondering if we could make >> >it more general. >> > >> >Kind regards, >> >Gustavo >> > >> >On Fri, 27 Feb 2026 at 06:38, Xuyang <[email protected]> wrote: >> > >> >> Hi, Gustavo. >> >> You're absolutely right! In an ideal scenario, the lifecycle of >> immutable >> >> columns should indeed be confined within the sequence +I, -U, +U, -D. >> >> However, in Flink today, we don't fully distinguish between (+I, -D) and >> >> (+U, -U) (e.g., at join nodes), which could maybe lead to the no-update >> >> metadata being incorrectly applied in some certain scenarios. >> >> >> >> >> >> Therefore, for simplicity, I'd prefer not to support -D for this first >> >> step. I'd like to hear your thoughts on this. I'd like to hear your >> >> thoughts on this. >> >> >> >> >> >> >> >> -- >> >> >> >> Best! >> >> Xuyang >> >> >> >> >> >> >> >> At 2026-02-26 19:03:57, "Gustavo de Morais" <[email protected]> >> >> wrote: >> >> >Hey Xuyang, >> >> > >> >> >Thanks for the updates and reply! >> >> > >> >> >Regarding dropping the restriction: In my thinking, in Flink's >> changelog >> >> >semantics, -D ends the row's lifetime. When we see -D followed by +I >> for >> >> >the same PK, like in the example you gave, that's imo creating a new >> row >> >> >rather than updating the existing one. I don't think it makes sense to >> >> >start tracking relations between "conceptually different rows". If it >> >> were >> >> >an update and still the same row, I'd expect a +/-U instead. >> >> > >> >> >So I'm still inclining towards being +1 to drop it. That means, NO >> UPDATE >> >> >would mean "immutable while the row exists" rather than "immutable for >> >> this >> >> >PK forever". For DeltaJoin's stricter needs (no deletes), we could >> enforce >> >> >that separately during planning. Does that distinction make sense to >> you? >> >> >Let me know what you think. >> >> > >> >> >Kind regards, >> >> >Gustavo >> >> > >> >> >On Thu, 26 Feb 2026 at 05:17, Xuyang <[email protected]> wrote: >> >> > >> >> >> Hi Gustavo, >> >> >> Regarding your suggestion to remove the deletion restriction: it's >> not >> >> >> only tied to delta joins. My primary concern before has been the >> >> >> significant overhead for the storage engine in tracking no update >> column >> >> >> changes across separate INSERT and DELETE messages. >> >> >> Consider this example, where col1 is the primary key and col3 is >> >> declared >> >> >> as a NO UPDATE column: >> >> >> Schema: (col1, col2, col3) >> >> >> 1)+I (pk1, a1, b1) >> >> >> 2)-D (pk1, a1, b1) >> >> >> 3)+I (pk1, a2, b2) >> >> >> In this sequence, the value of the no update column col3 changes >> from b1 >> >> >> to b2, which violates the NO UPDATE constraint. >> >> >> If we relax the restriction on DELETE operations, we would >> effectively >> >> >> shift the responsibility to users to guarantee that values of no >> update >> >> >> columns remain consistent across corresponding INSERT, DELETE, and >> >> UPDATE >> >> >> records. >> >> >> Given these implications, I’d like to hear your thoughts on whether >> we >> >> >> should proceed with removing this restriction. >> >> >> >> >> >> >> >> >> Separately, regarding the additional details you mentioned, I’ve >> updated >> >> >> them into the FLIP. Here’s a quick summary: >> >> >> - If a user declares NO UPDATE (b, c) on a table without a primary >> key, >> >> an >> >> >> error will be thrown: “NO UPDATE constraints must be defined on >> tables >> >> with >> >> >> a primary key.” >> >> >> - If a user declares NO UPDATE(a) and column a is already part of the >> >> >> primary key, the declaration is silently accepted. >> >> >> - Updated: CONSTRAINT %s COLUMNS (%s) NO UPDATE%s >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> Best! >> >> >> Xuyang >> >> >> >> >> >> >> >> >> >> >> >> At 2026-02-19 20:10:11, "Gustavo de Morais" <[email protected]> >> >> >> wrote: >> >> >> >Hey Xuyang, >> >> >> > >> >> >> >That's an interesting concept, thanks for the proposal! >> >> >> > >> >> >> >I like the FLIP and I think this could open some other >> optimizations. >> >> That >> >> >> >said, I think it makes sense to remove the deletion restriction from >> >> the >> >> >> >FLIP - since it's mostly a necessity that comes from the DeltaJoin. >> We >> >> >> >could make NO UPDATE be about immutability which is not directly >> >> connected >> >> >> >to row permanence. As far as I know, the DeltaJoin already enforces >> the >> >> >> >deletion restriction during planning for its sources, so it doesn't >> >> have >> >> >> to >> >> >> >be enforced by this functionality as well. >> >> >> > >> >> >> >Also, some small clarifications that could be added to the FLIP: >> >> >> >- If someone declares NO UPDATE (b, c) on a table without a primary >> >> key. I >> >> >> >suppose that's an error? >> >> >> >- If someone declares NO UPDATE(a) and a is already a primary key. >> Is >> >> it >> >> >> an >> >> >> >error or do we silently accept it? >> >> >> >- nit: CONSTRAINT %s FIELDS (%s) NO UPDATE%s -> you mean COLUMNS >> >> instead >> >> >> of >> >> >> >FIELDS, right? >> >> >> > >> >> >> >Kind regards, >> >> >> >Gustavo >> >> >> > >> >> >> > >> >> >> > >> >> >> >On Fri, 13 Feb 2026 at 10:08, Xuyang <[email protected]> wrote: >> >> >> > >> >> >> >> Hi, everyone. >> >> >> >> I’d like to propose FLIP-566: Introduce a new NO UPDATE column >> >> >> >> constraint[1]. >> >> >> >> Flink has introduced the Delta Join, whose core advantage lies in >> >> >> >> replacing redundant local state storage with direct queries to >> >> external >> >> >> >> storage systems (e.g., Apache Fluss). It currently relies on the >> >> upsert >> >> >> >> key, which ensures correct changelog processing without >> UPDATE_BEFORE >> >> >> >> messages. But this assumes the join key must be part of the >> primary >> >> key. >> >> >> >> As modern storage systems increasingly support general-purpose >> >> secondary >> >> >> >> secondary indexes (not limited to primary keys), this restriction >> is >> >> >> >> becoming outdated. We need a new semantic mechanism to guarantee >> the >> >> >> >> immutability of the join key—specifically, that for a given >> primary >> >> key, >> >> >> >> the column values comprising the join key cannot be modified. >> >> >> >> Looking forward to your feedback. >> >> >> >> >> >> >> >> >> >> >> >> [1] >> >> >> >> >> >> >> >> >> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-566%3A+Introduce+a+new+NO+UPDATE+constraint >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> >> >> Best! >> >> >> >> Xuyang >> >> >> >> >> >>
