Hey Xuyang, Thanks for the update and for driving this. Sounds good and +1 from my side.
Kind regards, Gustavo On Tue, 3 Mar 2026 at 09:28, Xuyang <[email protected]> wrote: > Great idea! I've replaced all "no update" with "immutable" in the FLIP. > Since the title has been updated, here is the new link[1] to this FLIP. > Additionally, if there are no further comments, I will initiate the voting > later tomorrow. > > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-566%3A+Introduce+a+new+IMMUTABLE+columns+constraint > > > > -- > > Best! > Xuyang > > > > At 2026-03-03 01:01:17, "Gustavo de Morais" <[email protected]> > wrote: > >Hey Xuyang, > > > >Thanks for the reply and the discussion. I understand now your primary > goal > >with the FLIP. If we want to enable these optimizations, PK-lifetime > >immutability makes sense! > > > >I'm fine with keeping the deletion restriction for v1. Suggestion: what if > >we called it IMMUTABLE instead of NO UPDATE? It's shorter and naturally > >implies more a PK-lifetime semantics rather than just no UPDATE > operations. > >If you rather go with NO UPDATE, we could introduce a separate NO UPDATE > >ALLOW DELETES variant later if there's demand. > > > >Kind regards, > >Gustavo > > > > > > > >On Sat, 28 Feb 2026 at 04:48, Xuyang <[email protected]> wrote: > > > >> Hi, Gustavo. I’d be glad to explore a more relaxed desigh together with > >> you! Let me share my thoughts. > >> > >> > >> When designing this Flip, my original intention for the "no update" > column > >> semantics was: as long as the primary key remains the same, the values > in > >> these columns must not change. This allows us to safely treat the "no > >> update" columns as part of the source's unique (upsert) key, without > >> needing to track or interpret the row kind of intermediate results. The > >> reason is that all intermediate data ultimately originates from the > source, > >> unless an explicit key transformation occurs(e.g. agg with grouping > key). > >> > >> > >> I did not prohibit -U (update-before) records because, in most storage > >> systems, an update is treated as a single atomic operation; Flink merely > >> decomposes it into -U and +U for internal processing. Storage systems > can > >> easily enforce the "no update" constraint by checking whether a single > >> update violates immutability for certain columns. > >> > >> > >> However, if we instead restrict the immutability guarantee only between > +I > >> and -D, due to upstream operators often emit multiple +I/-D record > pairs, > >> we will lose the ability to leverage "no update" column information for > >> optimizations. Consider the following example: > >> ``` > >> create table src1(a int, b int, c int, primary key(a) not enforced, > >> column(c) no update not enforced); > >> create table src2(id int, d int, primary key(id) not enforced); > >> select * from src1 join src2 on c = id; > >> ``` > >> This join prevents the upstream from generating DropUpdateBefore > because, > >> despite being declared as immutable, column `c` can actually take > different > >> values for the same primary key `a` across different +I and -D events. > As a > >> result, the join key `c` is actually not stable, causing records to be > >> shuffled to different parallel task. Consequently, the system must rely > on > >> -U to correctly process the data. > >> > >> > >> > >> > >> > >> -- > >> > >> Best! > >> Xuyang > >> > >> > >> > >> At 2026-02-27 21:35:08, "Gustavo de Morais" <[email protected]> > >> wrote: > >> >Hey Xuyang, > >> > > >> >Thanks for the reply. > >> > > >> >- Could you give an example for "which could maybe lead to the > no-update > >> >metadata being incorrectly applied in some certain scenario"? > >> >- Also, if we're banning -D because joins can't distinguish it from -U, > >> >then by the same logic we'd need to ban -U, right? Could you specify > that > >> >in the FLIP? > >> > > >> >Just to clarify, I'm +1 for the FLIP, I'm just wondering if we could > make > >> >it more general. > >> > > >> >Kind regards, > >> >Gustavo > >> > > >> >On Fri, 27 Feb 2026 at 06:38, Xuyang <[email protected]> wrote: > >> > > >> >> Hi, Gustavo. > >> >> You're absolutely right! In an ideal scenario, the lifecycle of > >> immutable > >> >> columns should indeed be confined within the sequence +I, -U, +U, -D. > >> >> However, in Flink today, we don't fully distinguish between (+I, -D) > and > >> >> (+U, -U) (e.g., at join nodes), which could maybe lead to the > no-update > >> >> metadata being incorrectly applied in some certain scenarios. > >> >> > >> >> > >> >> Therefore, for simplicity, I'd prefer not to support -D for this > first > >> >> step. I'd like to hear your thoughts on this. I'd like to hear your > >> >> thoughts on this. > >> >> > >> >> > >> >> > >> >> -- > >> >> > >> >> Best! > >> >> Xuyang > >> >> > >> >> > >> >> > >> >> At 2026-02-26 19:03:57, "Gustavo de Morais" <[email protected]> > >> >> wrote: > >> >> >Hey Xuyang, > >> >> > > >> >> >Thanks for the updates and reply! > >> >> > > >> >> >Regarding dropping the restriction: In my thinking, in Flink's > >> changelog > >> >> >semantics, -D ends the row's lifetime. When we see -D followed by +I > >> for > >> >> >the same PK, like in the example you gave, that's imo creating a new > >> row > >> >> >rather than updating the existing one. I don't think it makes sense > to > >> >> >start tracking relations between "conceptually different rows". If > it > >> >> were > >> >> >an update and still the same row, I'd expect a +/-U instead. > >> >> > > >> >> >So I'm still inclining towards being +1 to drop it. That means, NO > >> UPDATE > >> >> >would mean "immutable while the row exists" rather than "immutable > for > >> >> this > >> >> >PK forever". For DeltaJoin's stricter needs (no deletes), we could > >> enforce > >> >> >that separately during planning. Does that distinction make sense to > >> you? > >> >> >Let me know what you think. > >> >> > > >> >> >Kind regards, > >> >> >Gustavo > >> >> > > >> >> >On Thu, 26 Feb 2026 at 05:17, Xuyang <[email protected]> wrote: > >> >> > > >> >> >> Hi Gustavo, > >> >> >> Regarding your suggestion to remove the deletion restriction: it's > >> not > >> >> >> only tied to delta joins. My primary concern before has been the > >> >> >> significant overhead for the storage engine in tracking no update > >> column > >> >> >> changes across separate INSERT and DELETE messages. > >> >> >> Consider this example, where col1 is the primary key and col3 is > >> >> declared > >> >> >> as a NO UPDATE column: > >> >> >> Schema: (col1, col2, col3) > >> >> >> 1)+I (pk1, a1, b1) > >> >> >> 2)-D (pk1, a1, b1) > >> >> >> 3)+I (pk1, a2, b2) > >> >> >> In this sequence, the value of the no update column col3 changes > >> from b1 > >> >> >> to b2, which violates the NO UPDATE constraint. > >> >> >> If we relax the restriction on DELETE operations, we would > >> effectively > >> >> >> shift the responsibility to users to guarantee that values of no > >> update > >> >> >> columns remain consistent across corresponding INSERT, DELETE, and > >> >> UPDATE > >> >> >> records. > >> >> >> Given these implications, I’d like to hear your thoughts on > whether > >> we > >> >> >> should proceed with removing this restriction. > >> >> >> > >> >> >> > >> >> >> Separately, regarding the additional details you mentioned, I’ve > >> updated > >> >> >> them into the FLIP. Here’s a quick summary: > >> >> >> - If a user declares NO UPDATE (b, c) on a table without a primary > >> key, > >> >> an > >> >> >> error will be thrown: “NO UPDATE constraints must be defined on > >> tables > >> >> with > >> >> >> a primary key.” > >> >> >> - If a user declares NO UPDATE(a) and column a is already part of > the > >> >> >> primary key, the declaration is silently accepted. > >> >> >> - Updated: CONSTRAINT %s COLUMNS (%s) NO UPDATE%s > >> >> >> > >> >> >> > >> >> >> > >> >> >> -- > >> >> >> > >> >> >> Best! > >> >> >> Xuyang > >> >> >> > >> >> >> > >> >> >> > >> >> >> At 2026-02-19 20:10:11, "Gustavo de Morais" < > [email protected]> > >> >> >> wrote: > >> >> >> >Hey Xuyang, > >> >> >> > > >> >> >> >That's an interesting concept, thanks for the proposal! > >> >> >> > > >> >> >> >I like the FLIP and I think this could open some other > >> optimizations. > >> >> That > >> >> >> >said, I think it makes sense to remove the deletion restriction > from > >> >> the > >> >> >> >FLIP - since it's mostly a necessity that comes from the > DeltaJoin. > >> We > >> >> >> >could make NO UPDATE be about immutability which is not directly > >> >> connected > >> >> >> >to row permanence. As far as I know, the DeltaJoin already > enforces > >> the > >> >> >> >deletion restriction during planning for its sources, so it > doesn't > >> >> have > >> >> >> to > >> >> >> >be enforced by this functionality as well. > >> >> >> > > >> >> >> >Also, some small clarifications that could be added to the FLIP: > >> >> >> >- If someone declares NO UPDATE (b, c) on a table without a > primary > >> >> key. I > >> >> >> >suppose that's an error? > >> >> >> >- If someone declares NO UPDATE(a) and a is already a primary > key. > >> Is > >> >> it > >> >> >> an > >> >> >> >error or do we silently accept it? > >> >> >> >- nit: CONSTRAINT %s FIELDS (%s) NO UPDATE%s -> you mean COLUMNS > >> >> instead > >> >> >> of > >> >> >> >FIELDS, right? > >> >> >> > > >> >> >> >Kind regards, > >> >> >> >Gustavo > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> >On Fri, 13 Feb 2026 at 10:08, Xuyang <[email protected]> wrote: > >> >> >> > > >> >> >> >> Hi, everyone. > >> >> >> >> I’d like to propose FLIP-566: Introduce a new NO UPDATE column > >> >> >> >> constraint[1]. > >> >> >> >> Flink has introduced the Delta Join, whose core advantage lies > in > >> >> >> >> replacing redundant local state storage with direct queries to > >> >> external > >> >> >> >> storage systems (e.g., Apache Fluss). It currently relies on > the > >> >> upsert > >> >> >> >> key, which ensures correct changelog processing without > >> UPDATE_BEFORE > >> >> >> >> messages. But this assumes the join key must be part of the > >> primary > >> >> key. > >> >> >> >> As modern storage systems increasingly support general-purpose > >> >> secondary > >> >> >> >> secondary indexes (not limited to primary keys), this > restriction > >> is > >> >> >> >> becoming outdated. We need a new semantic mechanism to > guarantee > >> the > >> >> >> >> immutability of the join key—specifically, that for a given > >> primary > >> >> key, > >> >> >> >> the column values comprising the join key cannot be modified. > >> >> >> >> Looking forward to your feedback. > >> >> >> >> > >> >> >> >> > >> >> >> >> [1] > >> >> >> >> > >> >> >> > >> >> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-566%3A+Introduce+a+new+NO+UPDATE+constraint > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> -- > >> >> >> >> > >> >> >> >> Best! > >> >> >> >> Xuyang > >> >> >> > >> >> > >> >
