Hey Xuyang,

Thanks for the update and for driving this. Sounds good and +1 from my side.

Kind regards,
Gustavo

On Tue, 3 Mar 2026 at 09:28, Xuyang <[email protected]> wrote:

> Great idea! I've replaced all "no update" with "immutable" in the FLIP.
> Since the title has been updated, here is the new link[1] to this FLIP.
> Additionally, if there are no further comments, I will initiate the voting
> later tomorrow.
>
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-566%3A+Introduce+a+new+IMMUTABLE+columns+constraint
>
>
>
> --
>
>     Best!
>     Xuyang
>
>
>
> At 2026-03-03 01:01:17, "Gustavo de Morais" <[email protected]>
> wrote:
> >Hey Xuyang,
> >
> >Thanks for the reply and the discussion. I understand now your primary
> goal
> >with the FLIP. If we want to enable these optimizations, PK-lifetime
> >immutability makes sense!
> >
> >I'm fine with keeping the deletion restriction for v1. Suggestion: what if
> >we called it IMMUTABLE instead of NO UPDATE? It's shorter and naturally
> >implies more a PK-lifetime semantics rather than just no UPDATE
> operations.
> >If you rather go with NO UPDATE, we could introduce a separate NO UPDATE
> >ALLOW DELETES variant later if there's demand.
> >
> >Kind regards,
> >Gustavo
> >
> >
> >
> >On Sat, 28 Feb 2026 at 04:48, Xuyang <[email protected]> wrote:
> >
> >> Hi, Gustavo. I’d be glad to explore a more relaxed desigh together with
> >> you! Let me share my thoughts.
> >>
> >>
> >> When designing this Flip, my original intention for the "no update"
> column
> >> semantics was: as long as the primary key remains the same, the values
> in
> >> these columns must not change. This allows us to safely treat the "no
> >> update" columns as part of the source's unique (upsert) key, without
> >> needing to track or interpret the row kind of intermediate results. The
> >> reason is that all intermediate data ultimately originates from the
> source,
> >> unless an explicit key transformation occurs(e.g. agg with grouping
> key).
> >>
> >>
> >> I did not prohibit -U (update-before) records because, in most storage
> >> systems, an update is treated as a single atomic operation; Flink merely
> >> decomposes it into -U and +U for internal processing. Storage systems
> can
> >> easily enforce the "no update" constraint by checking whether a single
> >> update violates immutability for certain columns.
> >>
> >>
> >> However, if we instead restrict the immutability guarantee only between
> +I
> >> and -D, due to upstream operators often emit multiple +I/-D record
> pairs,
> >> we will lose the ability to leverage "no update" column information for
> >> optimizations. Consider the following example:
> >> ```
> >> create table src1(a int, b int, c int, primary key(a) not enforced,
> >> column(c) no update not enforced);
> >> create table src2(id int, d int, primary key(id) not enforced);
> >> select * from src1 join src2 on c = id;
> >> ```
> >> This join prevents the upstream from generating DropUpdateBefore
> because,
> >> despite being declared as immutable, column `c` can actually take
> different
> >> values for the same primary key `a` across different +I and -D events.
> As a
> >> result, the join key `c` is actually not stable, causing records to be
> >> shuffled to different parallel task. Consequently, the system must rely
> on
> >> -U to correctly process the data.
> >>
> >>
> >>
> >>
> >>
> >> --
> >>
> >>     Best!
> >>     Xuyang
> >>
> >>
> >>
> >> At 2026-02-27 21:35:08, "Gustavo de Morais" <[email protected]>
> >> wrote:
> >> >Hey Xuyang,
> >> >
> >> >Thanks for the reply.
> >> >
> >> >- Could you give an example for "which could maybe lead to the
> no-update
> >> >metadata being incorrectly applied in some certain scenario"?
> >> >- Also, if we're banning -D because joins can't distinguish it from -U,
> >> >then by the same logic we'd need to ban -U, right? Could you specify
> that
> >> >in the FLIP?
> >> >
> >> >Just to clarify, I'm +1 for the FLIP, I'm just wondering if we could
> make
> >> >it more general.
> >> >
> >> >Kind regards,
> >> >Gustavo
> >> >
> >> >On Fri, 27 Feb 2026 at 06:38, Xuyang <[email protected]> wrote:
> >> >
> >> >> Hi, Gustavo.
> >> >> You're absolutely right! In an ideal scenario, the lifecycle of
> >> immutable
> >> >> columns should indeed be confined within the sequence +I, -U, +U, -D.
> >> >> However, in Flink today, we don't fully distinguish between (+I, -D)
> and
> >> >> (+U, -U) (e.g., at join nodes), which could maybe lead to the
> no-update
> >> >> metadata being incorrectly applied in some certain scenarios.
> >> >>
> >> >>
> >> >> Therefore, for simplicity, I'd prefer not to support -D for this
> first
> >> >> step. I'd like to hear your thoughts on this. I'd like to hear your
> >> >> thoughts on this.
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >>
> >> >>     Best!
> >> >>     Xuyang
> >> >>
> >> >>
> >> >>
> >> >> At 2026-02-26 19:03:57, "Gustavo de Morais" <[email protected]>
> >> >> wrote:
> >> >> >Hey Xuyang,
> >> >> >
> >> >> >Thanks for the updates and reply!
> >> >> >
> >> >> >Regarding dropping the restriction: In my thinking, in Flink's
> >> changelog
> >> >> >semantics, -D ends the row's lifetime. When we see -D followed by +I
> >> for
> >> >> >the same PK, like in the example you gave, that's imo creating a new
> >> row
> >> >> >rather than updating the existing one. I don't think it makes sense
> to
> >> >> >start tracking relations between "conceptually different rows".  If
> it
> >> >> were
> >> >> >an update and still the same row, I'd expect a +/-U instead.
> >> >> >
> >> >> >So I'm still inclining towards being +1 to drop it. That means, NO
> >> UPDATE
> >> >> >would mean "immutable while the row exists" rather than "immutable
> for
> >> >> this
> >> >> >PK forever". For DeltaJoin's stricter needs (no deletes), we could
> >> enforce
> >> >> >that separately during planning. Does that distinction make sense to
> >> you?
> >> >> >Let me know what you think.
> >> >> >
> >> >> >Kind regards,
> >> >> >Gustavo
> >> >> >
> >> >> >On Thu, 26 Feb 2026 at 05:17, Xuyang <[email protected]> wrote:
> >> >> >
> >> >> >> Hi Gustavo,
> >> >> >> Regarding your suggestion to remove the deletion restriction: it's
> >> not
> >> >> >> only tied to delta joins. My primary concern before has been the
> >> >> >> significant overhead for the storage engine in tracking no update
> >> column
> >> >> >> changes across separate INSERT and DELETE messages.
> >> >> >> Consider this example, where col1 is the primary key and col3 is
> >> >> declared
> >> >> >> as a NO UPDATE column:
> >> >> >> Schema: (col1, col2, col3)
> >> >> >> 1)+I (pk1, a1, b1)
> >> >> >> 2)-D (pk1, a1, b1)
> >> >> >> 3)+I (pk1, a2, b2)
> >> >> >> In this sequence, the value of the no update column col3 changes
> >> from b1
> >> >> >> to b2, which violates the NO UPDATE constraint.
> >> >> >> If we relax the restriction on DELETE operations, we would
> >> effectively
> >> >> >> shift the responsibility to users to guarantee that values of no
> >> update
> >> >> >> columns remain consistent across corresponding INSERT, DELETE, and
> >> >> UPDATE
> >> >> >> records.
> >> >> >> Given these implications, I’d like to hear your thoughts on
> whether
> >> we
> >> >> >> should proceed with removing this restriction.
> >> >> >>
> >> >> >>
> >> >> >> Separately, regarding the additional details you mentioned, I’ve
> >> updated
> >> >> >> them into the FLIP. Here’s a quick summary:
> >> >> >> - If a user declares NO UPDATE (b, c) on a table without a primary
> >> key,
> >> >> an
> >> >> >> error will be thrown: “NO UPDATE constraints must be defined on
> >> tables
> >> >> with
> >> >> >> a primary key.”
> >> >> >> - If a user declares NO UPDATE(a) and column a is already part of
> the
> >> >> >> primary key, the declaration is silently accepted.
> >> >> >> - Updated: CONSTRAINT %s COLUMNS (%s) NO UPDATE%s
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >>
> >> >> >>     Best!
> >> >> >>     Xuyang
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> At 2026-02-19 20:10:11, "Gustavo de Morais" <
> [email protected]>
> >> >> >> wrote:
> >> >> >> >Hey Xuyang,
> >> >> >> >
> >> >> >> >That's an interesting concept, thanks for the proposal!
> >> >> >> >
> >> >> >> >I like the FLIP and I think this could open some other
> >> optimizations.
> >> >> That
> >> >> >> >said, I think it makes sense to remove the deletion restriction
> from
> >> >> the
> >> >> >> >FLIP - since it's mostly a necessity that comes from the
> DeltaJoin.
> >> We
> >> >> >> >could make NO UPDATE be about immutability which is not directly
> >> >> connected
> >> >> >> >to row permanence. As far as I know, the DeltaJoin already
> enforces
> >> the
> >> >> >> >deletion restriction during planning for its sources, so it
> doesn't
> >> >> have
> >> >> >> to
> >> >> >> >be enforced by this functionality as well.
> >> >> >> >
> >> >> >> >Also, some small clarifications that could be added to the FLIP:
> >> >> >> >- If someone declares NO UPDATE (b, c) on a table without a
> primary
> >> >> key. I
> >> >> >> >suppose that's an error?
> >> >> >> >- If someone declares NO UPDATE(a) and a is already a primary
> key.
> >> Is
> >> >> it
> >> >> >> an
> >> >> >> >error or do we silently accept it?
> >> >> >> >- nit: CONSTRAINT %s FIELDS (%s) NO UPDATE%s -> you mean COLUMNS
> >> >> instead
> >> >> >> of
> >> >> >> >FIELDS, right?
> >> >> >> >
> >> >> >> >Kind regards,
> >> >> >> >Gustavo
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >On Fri, 13 Feb 2026 at 10:08, Xuyang <[email protected]> wrote:
> >> >> >> >
> >> >> >> >> Hi, everyone.
> >> >> >> >> I’d like to propose FLIP-566: Introduce a new NO UPDATE column
> >> >> >> >> constraint[1].
> >> >> >> >> Flink has introduced the Delta Join, whose core advantage lies
> in
> >> >> >> >> replacing redundant local state storage with direct queries to
> >> >> external
> >> >> >> >> storage systems (e.g., Apache Fluss). It currently relies on
> the
> >> >> upsert
> >> >> >> >> key, which ensures correct changelog processing without
> >> UPDATE_BEFORE
> >> >> >> >> messages. But this assumes the join key must be part of the
> >> primary
> >> >> key.
> >> >> >> >> As modern storage systems increasingly support general-purpose
> >> >> secondary
> >> >> >> >> secondary indexes (not limited to primary keys), this
> restriction
> >> is
> >> >> >> >> becoming outdated. We need a new semantic mechanism to
> guarantee
> >> the
> >> >> >> >> immutability of the join key—specifically, that for a given
> >> primary
> >> >> key,
> >> >> >> >> the column values comprising the join key cannot be modified.
> >> >> >> >> Looking forward to your feedback.
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> [1]
> >> >> >> >>
> >> >> >>
> >> >>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-566%3A+Introduce+a+new+NO+UPDATE+constraint
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> --
> >> >> >> >>
> >> >> >> >>     Best!
> >> >> >> >>     Xuyang
> >> >> >>
> >> >>
> >>
>

Reply via email to