Hey Xuyang, Thanks for the reply and the discussion. I understand now your primary goal with the FLIP. If we want to enable these optimizations, PK-lifetime immutability makes sense!
I'm fine with keeping the deletion restriction for v1. Suggestion: what if we called it IMMUTABLE instead of NO UPDATE? It's shorter and naturally implies more a PK-lifetime semantics rather than just no UPDATE operations. If you rather go with NO UPDATE, we could introduce a separate NO UPDATE ALLOW DELETES variant later if there's demand. Kind regards, Gustavo On Sat, 28 Feb 2026 at 04:48, Xuyang <[email protected]> wrote: > Hi, Gustavo. I’d be glad to explore a more relaxed desigh together with > you! Let me share my thoughts. > > > When designing this Flip, my original intention for the "no update" column > semantics was: as long as the primary key remains the same, the values in > these columns must not change. This allows us to safely treat the "no > update" columns as part of the source's unique (upsert) key, without > needing to track or interpret the row kind of intermediate results. The > reason is that all intermediate data ultimately originates from the source, > unless an explicit key transformation occurs(e.g. agg with grouping key). > > > I did not prohibit -U (update-before) records because, in most storage > systems, an update is treated as a single atomic operation; Flink merely > decomposes it into -U and +U for internal processing. Storage systems can > easily enforce the "no update" constraint by checking whether a single > update violates immutability for certain columns. > > > However, if we instead restrict the immutability guarantee only between +I > and -D, due to upstream operators often emit multiple +I/-D record pairs, > we will lose the ability to leverage "no update" column information for > optimizations. Consider the following example: > ``` > create table src1(a int, b int, c int, primary key(a) not enforced, > column(c) no update not enforced); > create table src2(id int, d int, primary key(id) not enforced); > select * from src1 join src2 on c = id; > ``` > This join prevents the upstream from generating DropUpdateBefore because, > despite being declared as immutable, column `c` can actually take different > values for the same primary key `a` across different +I and -D events. As a > result, the join key `c` is actually not stable, causing records to be > shuffled to different parallel task. Consequently, the system must rely on > -U to correctly process the data. > > > > > > -- > > Best! > Xuyang > > > > At 2026-02-27 21:35:08, "Gustavo de Morais" <[email protected]> > wrote: > >Hey Xuyang, > > > >Thanks for the reply. > > > >- Could you give an example for "which could maybe lead to the no-update > >metadata being incorrectly applied in some certain scenario"? > >- Also, if we're banning -D because joins can't distinguish it from -U, > >then by the same logic we'd need to ban -U, right? Could you specify that > >in the FLIP? > > > >Just to clarify, I'm +1 for the FLIP, I'm just wondering if we could make > >it more general. > > > >Kind regards, > >Gustavo > > > >On Fri, 27 Feb 2026 at 06:38, Xuyang <[email protected]> wrote: > > > >> Hi, Gustavo. > >> You're absolutely right! In an ideal scenario, the lifecycle of > immutable > >> columns should indeed be confined within the sequence +I, -U, +U, -D. > >> However, in Flink today, we don't fully distinguish between (+I, -D) and > >> (+U, -U) (e.g., at join nodes), which could maybe lead to the no-update > >> metadata being incorrectly applied in some certain scenarios. > >> > >> > >> Therefore, for simplicity, I'd prefer not to support -D for this first > >> step. I'd like to hear your thoughts on this. I'd like to hear your > >> thoughts on this. > >> > >> > >> > >> -- > >> > >> Best! > >> Xuyang > >> > >> > >> > >> At 2026-02-26 19:03:57, "Gustavo de Morais" <[email protected]> > >> wrote: > >> >Hey Xuyang, > >> > > >> >Thanks for the updates and reply! > >> > > >> >Regarding dropping the restriction: In my thinking, in Flink's > changelog > >> >semantics, -D ends the row's lifetime. When we see -D followed by +I > for > >> >the same PK, like in the example you gave, that's imo creating a new > row > >> >rather than updating the existing one. I don't think it makes sense to > >> >start tracking relations between "conceptually different rows". If it > >> were > >> >an update and still the same row, I'd expect a +/-U instead. > >> > > >> >So I'm still inclining towards being +1 to drop it. That means, NO > UPDATE > >> >would mean "immutable while the row exists" rather than "immutable for > >> this > >> >PK forever". For DeltaJoin's stricter needs (no deletes), we could > enforce > >> >that separately during planning. Does that distinction make sense to > you? > >> >Let me know what you think. > >> > > >> >Kind regards, > >> >Gustavo > >> > > >> >On Thu, 26 Feb 2026 at 05:17, Xuyang <[email protected]> wrote: > >> > > >> >> Hi Gustavo, > >> >> Regarding your suggestion to remove the deletion restriction: it's > not > >> >> only tied to delta joins. My primary concern before has been the > >> >> significant overhead for the storage engine in tracking no update > column > >> >> changes across separate INSERT and DELETE messages. > >> >> Consider this example, where col1 is the primary key and col3 is > >> declared > >> >> as a NO UPDATE column: > >> >> Schema: (col1, col2, col3) > >> >> 1)+I (pk1, a1, b1) > >> >> 2)-D (pk1, a1, b1) > >> >> 3)+I (pk1, a2, b2) > >> >> In this sequence, the value of the no update column col3 changes > from b1 > >> >> to b2, which violates the NO UPDATE constraint. > >> >> If we relax the restriction on DELETE operations, we would > effectively > >> >> shift the responsibility to users to guarantee that values of no > update > >> >> columns remain consistent across corresponding INSERT, DELETE, and > >> UPDATE > >> >> records. > >> >> Given these implications, I’d like to hear your thoughts on whether > we > >> >> should proceed with removing this restriction. > >> >> > >> >> > >> >> Separately, regarding the additional details you mentioned, I’ve > updated > >> >> them into the FLIP. Here’s a quick summary: > >> >> - If a user declares NO UPDATE (b, c) on a table without a primary > key, > >> an > >> >> error will be thrown: “NO UPDATE constraints must be defined on > tables > >> with > >> >> a primary key.” > >> >> - If a user declares NO UPDATE(a) and column a is already part of the > >> >> primary key, the declaration is silently accepted. > >> >> - Updated: CONSTRAINT %s COLUMNS (%s) NO UPDATE%s > >> >> > >> >> > >> >> > >> >> -- > >> >> > >> >> Best! > >> >> Xuyang > >> >> > >> >> > >> >> > >> >> At 2026-02-19 20:10:11, "Gustavo de Morais" <[email protected]> > >> >> wrote: > >> >> >Hey Xuyang, > >> >> > > >> >> >That's an interesting concept, thanks for the proposal! > >> >> > > >> >> >I like the FLIP and I think this could open some other > optimizations. > >> That > >> >> >said, I think it makes sense to remove the deletion restriction from > >> the > >> >> >FLIP - since it's mostly a necessity that comes from the DeltaJoin. > We > >> >> >could make NO UPDATE be about immutability which is not directly > >> connected > >> >> >to row permanence. As far as I know, the DeltaJoin already enforces > the > >> >> >deletion restriction during planning for its sources, so it doesn't > >> have > >> >> to > >> >> >be enforced by this functionality as well. > >> >> > > >> >> >Also, some small clarifications that could be added to the FLIP: > >> >> >- If someone declares NO UPDATE (b, c) on a table without a primary > >> key. I > >> >> >suppose that's an error? > >> >> >- If someone declares NO UPDATE(a) and a is already a primary key. > Is > >> it > >> >> an > >> >> >error or do we silently accept it? > >> >> >- nit: CONSTRAINT %s FIELDS (%s) NO UPDATE%s -> you mean COLUMNS > >> instead > >> >> of > >> >> >FIELDS, right? > >> >> > > >> >> >Kind regards, > >> >> >Gustavo > >> >> > > >> >> > > >> >> > > >> >> >On Fri, 13 Feb 2026 at 10:08, Xuyang <[email protected]> wrote: > >> >> > > >> >> >> Hi, everyone. > >> >> >> I’d like to propose FLIP-566: Introduce a new NO UPDATE column > >> >> >> constraint[1]. > >> >> >> Flink has introduced the Delta Join, whose core advantage lies in > >> >> >> replacing redundant local state storage with direct queries to > >> external > >> >> >> storage systems (e.g., Apache Fluss). It currently relies on the > >> upsert > >> >> >> key, which ensures correct changelog processing without > UPDATE_BEFORE > >> >> >> messages. But this assumes the join key must be part of the > primary > >> key. > >> >> >> As modern storage systems increasingly support general-purpose > >> secondary > >> >> >> secondary indexes (not limited to primary keys), this restriction > is > >> >> >> becoming outdated. We need a new semantic mechanism to guarantee > the > >> >> >> immutability of the join key—specifically, that for a given > primary > >> key, > >> >> >> the column values comprising the join key cannot be modified. > >> >> >> Looking forward to your feedback. > >> >> >> > >> >> >> > >> >> >> [1] > >> >> >> > >> >> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-566%3A+Introduce+a+new+NO+UPDATE+constraint > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> -- > >> >> >> > >> >> >> Best! > >> >> >> Xuyang > >> >> > >> >
