Re: [DISCUSS] FLIP-550: Add similar support for CREATE/ALTER operations for MATERIALIZED TABLEs as for TABLEs

Ron Liu Tue, 28 Oct 2025 18:46:57 -0700

Hi, Sergey

Makes sense to me.


Best,
Ron

Sergey Nuyanzin <[email protected]> 于2025年10月29日周三 06:29写道：

> Hi Ron,
> thank you for your reply
>
> >>> I agree with the case for a compute column or metadata column, but I
> still
> don't think physical columns should be added. I haven't seen a real-world
> case for it, so it shouldn't be supported with that syntax.
>
> As mentioned above, I'm ok to exclude physical columns from this FLIP
> and introduce a separate validation which will forbid them.
>
> If this sounds ok, then I will update FLIP's page about that
>
> On Tue, Oct 28, 2025 at 2:45 AM Ron Liu <[email protected]> wrote:
> >
> > Hi, Sergey
> >
> > Sorry for late reply.
> >
> > >>> About more realworld case: sometimes it is required to pass extra
> > information like e.g. headers with help of compute or metadata
> > columns.
> >
> > We can add extra validation telling that physical columns are not
> > allowed to be added/modified/dropped.
> >
> > However even with metadata/compute columns it will require rewriting
> > the query (which will be done as a part of the operation).
> > WDYT?
> >
> > I agree with the case for a compute column or metadata column, but I
> still
> > don't think physical columns should be added. I haven't seen a real-world
> > case for it, so it shouldn't be supported with that syntax.
> >
> >
> > Best,
> > Ron
> >
> >
> > Sergey Nuyanzin <[email protected]> 于2025年10月21日周二 17:19写道：
> >
> > > Hi Ron,
> > > sorry for the delay
> > >
> > > About more realworld case: sometimes it is required to pass extra
> > > information like e.g. headers with help of compute or metadata
> > > columns.
> > >
> > > We can add extra validation telling that physical columns are not
> > > allowed to be added/modified/dropped.
> > >
> > > However even with metadata/compute columns it will require rewriting
> > > the query (which will be done as a part of the operation).
> > > WDYT?
> > >
> > > >A new question: regarding the operation ALTER MATERIALIZED TABLE
> MyTable
> > > >ADD PARTITION (p1=1, p2='a') WITH ('k1'='v1'),  what is its intended
> > > >semantics? Is it purely a metadata-only operation, or does it also
> trigger
> > > >a job to refresh data for the new partition? I believe it should be
> the
> > > >latter—what’s your view?
> > >
> > > yes it also should trigger job after that
> > >
> > > On Fri, Oct 10, 2025 at 4:39 AM Ron Liu <[email protected]> wrote:
> > > >
> > > > Hi, Sergey.
> > > >
> > > > Thanks for your quick response.
> > > >
> > > > >>> Probably one of the main use cases here is column name
> reservation
> > > for
> > > > the future.
> > > >
> > > > I haven’t seen a concrete real-world business use case for this
> feature.
> > > My
> > > > concern is: if the sole motivation is to align syntactically with
> CTAS
> > > > (Create Table As Select), what is its actual value and significance?
> > > >
> > > > >>> Isn't it the same in the case of tables and pipelines bound to
> them?
> > > > I was thinking that since there is MaterializedTableManager, then
> > > > based on coming TableChangeOperation
> > > > it could decide then how to process such change: for example full
> > > > recompute or something more sophisticated
> > > >
> > > > I believe this is not entirely equivalent to a regular table:
> > > >
> > > >    - A materialized table consists of multiple components: table
> > > metadata,
> > > >    pipeline, and data. The pipeline is an integral part of the
> > > materialized
> > > >    table and is managed by it. We must ensure the stability and
> > > consistency of
> > > >    all these components.
> > > >    - Unlike regular tables, the schema and data of a materialized
> table
> > > are
> > > >    derived from and continuously updated by its defining query.
> > > Therefore,
> > > >    when adding, modifying, or dropping columns, the correct approach
> > > should be
> > > >    to first update the query, and let the query drive the schema
> change.
> > > >    This is logically sound and delivers real business value. This
> > > capability
> > > >    is already supported in FLIP-492[1] and FLIP-546 and can be
> further
> > > >    extended.
> > > >    - Allowing column modifications via ALTER MATERIALIZED TABLE
> > > >    ADD/MODIFY/DROP COLUMN would cause a mismatch between the
> materialized
> > > >    table’s query definition and its physical schema, leading to
> > > inconsistency
> > > >    and significantly increasing operational and observability costs.
> > > Maybe
> > > >    also confuse the user.
> > > >    - Due to the special nature of materialized tables, metadata
> > > >    modifications must be handled with great caution. We should not
> allow
> > > >    arbitrary schema changes—for example, we cannot freely reorder
> > > columns or
> > > >    change column types, and we may not even allow arbitrary column
> > > deletions,
> > > >    as these could break data compatibility. Moreover, if the
> underlying
> > > >    physical storage doesn’t support such changes, the pipeline may
> fail
> > > to
> > > >    run, and the MaterializedTableManager would be unable to handle
> it. In
> > > >    such cases, the best solution is for the user to explicitly
> recreate
> > > the
> > > >    materialized table. We must be cautious with user data—we should
> not
> > > >    silently rebuild the physical table or reprocess historical data
> on
> > > the
> > > >    user’s behalf. Additionally, from a technical perspective, we may
> > > currently
> > > >    lack the capability to perform a full historical backfill in
> > > >    MaterializedTableManager, so such operations should be explicitly
> > > triggered
> > > >    by the user.
> > > >
> > > > A new question: regarding the operation ALTER MATERIALIZED TABLE
> MyTable
> > > > ADD PARTITION (p1=1, p2='a') WITH ('k1'='v1'),  what is its intended
> > > > semantics? Is it purely a metadata-only operation, or does it also
> > > trigger
> > > > a job to refresh data for the new partition? I believe it should be
> the
> > > > latter—what’s your view?
> > > >
> > > >
> > > > With respect to the operations proposed in the FLIP, I think we can
> > > support
> > > > those that only affect metadata and do not impact the materialized
> > > table’s
> > > > query logic or data update behavior. The following operations are
> > > > acceptable:
> > > >
> > > >
> > > >    - Support defining watermarks when creating a MATERIALIZED TABLE.
> > > >    - Support specifying a column_list when creating a MATERIALIZED
> TABLE.
> > > >    - Support ALTER MATERIALIZED TABLE to add watermarks, primary
> keys, or
> > > >    partitions.
> > > >    - Support ALTER MATERIALIZED TABLE to drop watermarks, primary
> keys,
> > > or
> > > >    partitions.
> > > >
> > > > For all other operations that would affect data updates or require
> query
> > > > rewriting, I remain cautious and reserved.
> > > >
> > > >
> > > > 1.
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-492%3A+Support+Query+Modifications+for+Materialized+Tables
> > > >
> > > > 2.
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-546%3A+Introduce+CREATE+OR+ALTER+for+Materialized+Tables
> > > >
> > > > Best,
> > > > Ron
> > > >
> > > > Sergey Nuyanzin <[email protected]> 于2025年10月9日周四 18:38写道：
> > > >
> > > > > Hi Ron,
> > > > >
> > > > > thank you for the feedback
> > > > >
> > > > > >When adding new columns via CREATE or ALTER that are not included
> in
> > > the
> > > > > defining query of the Materialized Table—who is responsible for
> > > updating
> > > > > the data in these new columns?
> > > > >
> > > > > when we detect some new columns which are not present in query,
> > > > > then
> > > > > 1) validate that the type of each of them is nullable (otherwise
> throw
> > > > > ValidationException)
> > > > > 2) merge them into schema
> > > > > 3) rewrite materializedTable query in a way that now this query
> fills
> > > > > newly added columns with nulls.
> > > > > It means that newly rewritten query will be responsible for filling
> > > > > these null values
> > > > > The approach is similar to the way CTAS behaves in the same
> situation.
> > > > >
> > > > > Probably one of the main use cases here is column name reservation
> for
> > > > > the future.
> > > > >
> > > > > >For ALTER MATERIALIZED TABLE operations like ADD/MODIFY/DROP
> > > > > columns—these changes could cause the pipeline bound to the
> > > Materialized
> > > > > Table to fail.
> > > > >
> > > > > Isn't it the same in the case of tables and pipelines bound to
> them?
> > > > > I was thinking that since there is MaterializedTableManager, then
> > > > > based on coming TableChangeOperation
> > > > > it could decide then how to process such change: for example full
> > > > > recompute or something more sophisticated
> > > > >
> > > > > Looking forward for your comments
> > > > >
> > > > > On Thu, Oct 9, 2025 at 11:13 AM Ron Liu <[email protected]>
> wrote:
> > > > > >
> > > > > > Hi, Sergey.
> > > > > >
> > > > > > I was on vacation recently, so sorry for joining this discussion
> so
> > > late.
> > > > > >
> > > > > > I’ve carefully reviewed the FLIP, and purely from the
> perspective of
> > > > > > aligning Materialized Table operations with those of a regular
> > > Table, I
> > > > > > support this proposal in principle. However, in my understanding,
> > > > > > Materialized Tables and regular Tables are fundamentally
> different. A
> > > > > > Materialized Table is bound to a specific pipeline that updates
> its
> > > > > > data—this pipeline is generated from the associated query. In
> > > contrast, a
> > > > > > regular Table isn’t tied to any pipeline; users manually write
> > > queries to
> > > > > > update its data. Performing an ALTER operation on a regular Table
> > > only
> > > > > > modifies metadata, whereas performing ALTER on a Materialized
> Table
> > > > > affects
> > > > > > not only metadata but also the underlying data update mechanism.
> > > > > >
> > > > > > Given this context, I have the following questions:
> > > > > > 1. When adding new columns via CREATE or ALTER that are not
> included
> > > in
> > > > > the
> > > > > > defining query of the Materialized Table—who is responsible for
> > > updating
> > > > > > the data in these new columns? I’m unclear about the purpose and
> use
> > > case
> > > > > > for adding such columns. Could you provide a concrete example?
> > > > > > 2. For ALTER MATERIALIZED TABLE operations like ADD/MODIFY/DROP
> > > > > > columns—these changes could cause the pipeline bound to the
> > > Materialized
> > > > > > Table to fail. What is the exact execution flow for these
> operations?
> > > > > Could
> > > > > > you elaborate on the runtime behavior for each type of operation?
> > > Since
> > > > > > these actions impact actual data updates—not just metadata—this
> is a
> > > > > > critical concern.
> > > > > > In summary, I believe we shouldn’t blindly apply all regular
> Table
> > > > > > operations directly to Materialized Tables. Instead, we should
> > > > > selectively
> > > > > > support a subset of operations based on real-world usage
> scenarios
> > > and
> > > > > > semantic correctness. What’s your take on this? Best, Ron
> > > > > >
> > > > > > Sergey Nuyanzin <[email protected]> 于2025年10月8日周三 08:05写道：
> > > > > >
> > > > > > > Hi Lincoln,
> > > > > > >
> > > > > > > Thank you for your feedback.
> > > > > > >
> > > > > > > I guess we already have similar behavior for CTAS, where we
> could
> > > put
> > > > > > > more columns than we have for query.
> > > > > > > In this case these extra columns should be filled with nulls,
> and
> > > the
> > > > > > > query should be rewritten accordingly [1].
> > > > > > > This also means that extra columns should have nullable type
> > > (there is
> > > > > > > a dedicated validation for this).
> > > > > > > It means that for non query columns we have such default
> values and
> > > > > > > query is rewritten taking them into account
> > > > > > >
> > > > > > > Regarding adding columns with alter, or some other changes like
> > > > > > > adding/dropping columns, constraints, distribution
> > > > > > > if I understand correctly MaterializedTableManager looking at
> table
> > > > > > > change can decide whether it should recompute materialized
> table or
> > > > > > > not
> > > > > > >
> > > > > > > Would it make sense?
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > >
> > >
> https://github.com/apache/flink/blob/3478ddf08bce49e271f69b922a37ccada6f58688/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/operations/converters/table/SqlCreateTableAsConverter.java#L66-L74
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Oct 7, 2025 at 4:14 AM Lincoln Lee <
> [email protected]
> > > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > Thanks Sergey for driving this FLIP, it's a great addition to
> > > > > > > materialized
> > > > > > > > table!
> > > > > > > >
> > > > > > > > Since it coincided with China's National Day holiday and
> > > everyone is
> > > > > > > still
> > > > > > > > on
> > > > > > > > vacation, we couldn't reply promptly.
> > > > > > > >
> > > > > > > > I haven't fully reviewed all the content in the FLIP yet, but
> > > > > there's an
> > > > > > > > important issue on the ALTER statement:
> > > > > > > >
> > > > > > > > Unlike a regular CREATE TABLE, Materialized Table derives its
> > > schema
> > > > > from
> > > > > > > > the defined query, columns are generated based on the query
> (and,
> > > > > similar
> > > > > > > > to a
> > > > > > > > materialized view, the underlying data for these columns is
> > > tightly
> > > > > > > coupled
> > > > > > > > to
> > > > > > > > the query definition). Therefore, we cannot simply interpret
> the
> > > > > effect
> > > > > > > of
> > > > > > > > an
> > > > > > > > single `ALTER MATERIALIZED TABLE ADD New_Column` statement.
> > > > > Supporting
> > > > > > > this
> > > > > > > > likely requires accompanying column default value，and raises
> > > > > > > compatibility
> > > > > > > > concerns regarding historical data, that is a complex topic
> we
> > > > > previously
> > > > > > > > discussed offline during the design process of FLIP-492.
> > > > > > > >
> > > > > > > > Also, once Ron is back in the office, he may give a more
> detailed
> > > > > > > comment.
> > > > > > > >
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Lincoln Lee
> > > > > > > >
> > > > > > > >
> > > > > > > > Sergey Nuyanzin <[email protected]> 于2025年10月2日周四 20:15写道：
> > > > > > > >
> > > > > > > > > Thank you Ramin
> > > > > > > > >
> > > > > > > > > In case there is no more feedback/objections
> > > > > > > > > I would start voting thread next week
> > > > > > > > >
> > > > > > > > > On Thu, Sep 25, 2025 at 10:43 AM Ramin Gharib <
> > > > > [email protected]>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Sergey,
> > > > > > > > > > Thanks for driving this! This sounds good to me! +1
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > >
> > > > > > > > > > Ramin
> > > > > > > > > >
> > > > > > > > > > On Wed, Sep 24, 2025 at 2:14 PM Sergey Nuyanzin <
> > > > > [email protected]
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi everyone,
> > > > > > > > > > > I'd like to start a discussion of FLIP-550
> > > > > > > > > > > Add similar support for CREATE/ALTER operations for
> > > > > MATERIALIZED
> > > > > > > > > > > TABLEs as for TABLEs [1].
> > > > > > > > > > >
> > > > > > > > > > > This FLIP is another step towards making tables and
> > > > > materialized
> > > > > > > > > > > tables more consistent. There was already one
> improvement
> > > in
> > > > > that
> > > > > > > > > > > direction like FLIP-542 [2] to add DISTRIBUTION and
> SHOW
> > > > > > > MATERIALIZED
> > > > > > > > > > > TABLES support. However there were several more things
> > > noticed
> > > > > > > > > > > comparing behavior for CREATE and ALTER operations. For
> > > > > instance
> > > > > > > right
> > > > > > > > > > > now for materialized tables it is impossible to set
> > > anything
> > > > > but
> > > > > > > table
> > > > > > > > > > > constraint while for tables (CREATE TABLE AS) it is
> > > possible to
> > > > > > > > > > > provide schema definition since FLIP-463 [3], also
> ALTER
> > > > > operations
> > > > > > > > > > > for TABLE is a way more mature than for MATERIALIZED
> TABLE.
> > > > > This
> > > > > > > FLIP
> > > > > > > > > > > is about to decrease the difference by enabling more
> > > similar
> > > > > > > features
> > > > > > > > > > > for materialized tables.
> > > > > > > > > > >
> > > > > > > > > > > Introducing schema definition support for materialized
> > > tables
> > > > > will
> > > > > > > > > > > provide users with greater control and flexibility and
> also
> > > > > will
> > > > > > > unify
> > > > > > > > > > > usage of tables and materialized tables.
> > > > > > > > > > >
> > > > > > > > > > > [1]
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=387648095
> > > > > > > > > > >
> > > > > > > > > > > [2]
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-542%3A+Make+materialized+table+DDL+consistent+with+regular+tables
> > > > > > > > > > >
> > > > > > > > > > > [3]
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-463%3A+Schema+Definition+in+CREATE+TABLE+AS+Statement
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Best regards,
> > > > > > > > > > > Sergey
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Best regards,
> > > > > > > > > Sergey
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best regards,
> > > > > > > Sergey
> > > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Sergey
> > > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Sergey
> > >
>
>
>
> --
> Best regards,
> Sergey
>

Re: [DISCUSS] FLIP-550: Add similar support for CREATE/ALTER operations for MATERIALIZED TABLEs as for TABLEs

Reply via email to