Re: [DISCUSS] FLIP-550: Add similar support for CREATE/ALTER operations for MATERIALIZED TABLEs as for TABLEs

Timo Walther Mon, 03 Nov 2025 06:03:16 -0800

Hi Sergey,

+1 for this FLIP. But one last comment:

@Ron: I'm fine excluding adding/altering physical columns. But should westill allow the case where the data type of column should be declared orthe column name or COMMENT?


E.g. take:

CREATE MATERIALIZED TABLE AS SELECT 'Hello';

By default it will result in (CHAR(5)). Shouldn't we allow users to define:


CREATE MATERIALIZED TABLE (name STRING) AS SELECT 'Hello';

We can forbid adding additional columns, but at least the one of thequery could be better defined.


Also e.g.:

CREATE MATERIALIZED TABLE (name STRING COMMENT 'Comment of the user') ASSELECT 'Hello';


What do you think?

Cheers,
Timo





On 02.11.25 23:35, Sergey Nuyanzin wrote:

Great, thank you Ron!

In case there is no more feedback
I would suggest to move to voting[1] step

[1] https://lists.apache.org/thread/x7k41wtmp15wrcg7dqpb1f8tw1wstk0s

On Thu, Oct 30, 2025 at 2:34 AM Ron Liu <[email protected]> wrote:


Thanks for update, LGTM.

Best,
Ron

Sergey Nuyanzin <[email protected]> 于2025年10月30日周四 08:18写道：

Cool, thank you
updated FLIP
would be great if you have time to check it and tell if anything else
should be changed

On Wed, Oct 29, 2025 at 2:47 AM Ron Liu <[email protected]> wrote:


Hi, Sergey

Makes sense to me.

Best,
Ron

Sergey Nuyanzin <[email protected]> 于2025年10月29日周三 06:29写道：

Hi Ron,
thank you for your reply

I agree with the case for a compute column or metadata column, but

still
don't think physical columns should be added. I haven't seen a

real-world

case for it, so it shouldn't be supported with that syntax.

As mentioned above, I'm ok to exclude physical columns from this FLIP
and introduce a separate validation which will forbid them.

If this sounds ok, then I will update FLIP's page about that

On Tue, Oct 28, 2025 at 2:45 AM Ron Liu <[email protected]> wrote:


Hi, Sergey

Sorry for late reply.

About more realworld case: sometimes it is required to pass extra

information like e.g. headers with help of compute or metadata
columns.

We can add extra validation telling that physical columns are not
allowed to be added/modified/dropped.

However even with metadata/compute columns it will require rewriting
the query (which will be done as a part of the operation).
WDYT?

I agree with the case for a compute column or metadata column, but I

still

don't think physical columns should be added. I haven't seen a

real-world

case for it, so it shouldn't be supported with that syntax.


Best,
Ron


Sergey Nuyanzin <[email protected]> 于2025年10月21日周二 17:19写道：

Hi Ron,
sorry for the delay

About more realworld case: sometimes it is required to pass extra
information like e.g. headers with help of compute or metadata
columns.

We can add extra validation telling that physical columns are not
allowed to be added/modified/dropped.

However even with metadata/compute columns it will require

rewriting

the query (which will be done as a part of the operation).
WDYT?

A new question: regarding the operation ALTER MATERIALIZED TABLE

MyTable

ADD PARTITION (p1=1, p2='a') WITH ('k1'='v1'),  what is its

intended

semantics? Is it purely a metadata-only operation, or does it also

trigger

a job to refresh data for the new partition? I believe it should

be

the

latter—what’s your view?


yes it also should trigger job after that

On Fri, Oct 10, 2025 at 4:39 AM Ron Liu <[email protected]>

wrote:


Hi, Sergey.

Thanks for your quick response.

Probably one of the main use cases here is column name

reservation

for

the future.

I haven’t seen a concrete real-world business use case for this

feature.

My

concern is: if the sole motivation is to align syntactically with

CTAS

(Create Table As Select), what is its actual value and

significance?

Isn't it the same in the case of tables and pipelines bound

to

them?

I was thinking that since there is MaterializedTableManager, then
based on coming TableChangeOperation
it could decide then how to process such change: for example full
recompute or something more sophisticated

I believe this is not entirely equivalent to a regular table:

    - A materialized table consists of multiple components: table

metadata,

    pipeline, and data. The pipeline is an integral part of the

materialized

    table and is managed by it. We must ensure the stability and

consistency of

    all these components.
    - Unlike regular tables, the schema and data of a materialized

table

are

    derived from and continuously updated by its defining query.

Therefore,

    when adding, modifying, or dropping columns, the correct

approach

should be

    to first update the query, and let the query drive the schema

change.

    This is logically sound and delivers real business value. This

capability

    is already supported in FLIP-492[1] and FLIP-546 and can be

further

    extended.
    - Allowing column modifications via ALTER MATERIALIZED TABLE
    ADD/MODIFY/DROP COLUMN would cause a mismatch between the

materialized

    table’s query definition and its physical schema, leading to

inconsistency

    and significantly increasing operational and observability

costs.

Maybe

    also confuse the user.
    - Due to the special nature of materialized tables, metadata
    modifications must be handled with great caution. We should

not

allow

    arbitrary schema changes—for example, we cannot freely reorder

columns or

    change column types, and we may not even allow arbitrary

column

deletions,

    as these could break data compatibility. Moreover, if the

underlying

    physical storage doesn’t support such changes, the pipeline

may

fail

to

    run, and the MaterializedTableManager would be unable to

handle

it. In

    such cases, the best solution is for the user to explicitly

recreate

the

    materialized table. We must be cautious with user data—we

should

not

    silently rebuild the physical table or reprocess historical

data

on

the

    user’s behalf. Additionally, from a technical perspective, we

may

currently

    lack the capability to perform a full historical backfill in
    MaterializedTableManager, so such operations should be

explicitly

triggered

    by the user.

A new question: regarding the operation ALTER MATERIALIZED TABLE

MyTable

ADD PARTITION (p1=1, p2='a') WITH ('k1'='v1'),  what is its

intended

semantics? Is it purely a metadata-only operation, or does it

also

trigger

a job to refresh data for the new partition? I believe it should

be

the

latter—what’s your view?


With respect to the operations proposed in the FLIP, I think we

can

support

those that only affect metadata and do not impact the

materialized

table’s

query logic or data update behavior. The following operations are
acceptable:


    - Support defining watermarks when creating a MATERIALIZED

TABLE.

    - Support specifying a column_list when creating a

MATERIALIZED

TABLE.

    - Support ALTER MATERIALIZED TABLE to add watermarks, primary

keys, or

    partitions.
    - Support ALTER MATERIALIZED TABLE to drop watermarks, primary

keys,

or

    partitions.

For all other operations that would affect data updates or

require

query

rewriting, I remain cautious and reserved.


1.

https://cwiki.apache.org/confluence/display/FLINK/FLIP-492%3A+Support+Query+Modifications+for+Materialized+Tables

2.

https://cwiki.apache.org/confluence/display/FLINK/FLIP-546%3A+Introduce+CREATE+OR+ALTER+for+Materialized+Tables


Best,
Ron

Sergey Nuyanzin <[email protected]> 于2025年10月9日周四 18:38写道：

Hi Ron,

thank you for the feedback

When adding new columns via CREATE or ALTER that are not

included

in

the

defining query of the Materialized Table—who is responsible for

updating

the data in these new columns?

when we detect some new columns which are not present in query,
then
1) validate that the type of each of them is nullable

(otherwise

throw

ValidationException)
2) merge them into schema
3) rewrite materializedTable query in a way that now this query

fills

newly added columns with nulls.
It means that newly rewritten query will be responsible for

filling

these null values
The approach is similar to the way CTAS behaves in the same

situation.


Probably one of the main use cases here is column name

reservation

for

the future.

For ALTER MATERIALIZED TABLE operations like ADD/MODIFY/DROP

columns—these changes could cause the pipeline bound to the

Materialized

Table to fail.

Isn't it the same in the case of tables and pipelines bound to

them?

I was thinking that since there is MaterializedTableManager,

then

based on coming TableChangeOperation
it could decide then how to process such change: for example

full

recompute or something more sophisticated

Looking forward for your comments

On Thu, Oct 9, 2025 at 11:13 AM Ron Liu <[email protected]>

wrote:


Hi, Sergey.

I was on vacation recently, so sorry for joining this

discussion

so

late.


I’ve carefully reviewed the FLIP, and purely from the

perspective of

aligning Materialized Table operations with those of a

regular

Table, I

support this proposal in principle. However, in my

understanding,

Materialized Tables and regular Tables are fundamentally

different. A

Materialized Table is bound to a specific pipeline that

updates

its

data—this pipeline is generated from the associated query. In

contrast, a

regular Table isn’t tied to any pipeline; users manually

write

queries to

update its data. Performing an ALTER operation on a regular

Table

only

modifies metadata, whereas performing ALTER on a Materialized

Table

affects

not only metadata but also the underlying data update

mechanism.


Given this context, I have the following questions:
1. When adding new columns via CREATE or ALTER that are not

included

in

the

defining query of the Materialized Table—who is responsible

for

updating

the data in these new columns? I’m unclear about the purpose

and

use

case

for adding such columns. Could you provide a concrete

example?

2. For ALTER MATERIALIZED TABLE operations like

ADD/MODIFY/DROP

columns—these changes could cause the pipeline bound to the

Materialized

Table to fail. What is the exact execution flow for these

operations?

Could

you elaborate on the runtime behavior for each type of

operation?

Since

these actions impact actual data updates—not just

metadata—this

is a

critical concern.
In summary, I believe we shouldn’t blindly apply all regular

Table

operations directly to Materialized Tables. Instead, we

should

selectively

support a subset of operations based on real-world usage

scenarios

and

semantic correctness. What’s your take on this? Best, Ron

Sergey Nuyanzin <[email protected]> 于2025年10月8日周三 08:05写道：

Hi Lincoln,

Thank you for your feedback.

I guess we already have similar behavior for CTAS, where we

could

put

more columns than we have for query.
In this case these extra columns should be filled with

nulls,

and

the

query should be rewritten accordingly [1].
This also means that extra columns should have nullable

type

(there is

a dedicated validation for this).
It means that for non query columns we have such default

values and

query is rewritten taking them into account

Regarding adding columns with alter, or some other changes

like

adding/dropping columns, constraints, distribution
if I understand correctly MaterializedTableManager looking

at

table

change can decide whether it should recompute materialized

table or

not

Would it make sense?

[1]

https://github.com/apache/flink/blob/3478ddf08bce49e271f69b922a37ccada6f58688/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/operations/converters/table/SqlCreateTableAsConverter.java#L66-L74



On Tue, Oct 7, 2025 at 4:14 AM Lincoln Lee <

[email protected]

wrote:


Thanks Sergey for driving this FLIP, it's a great

addition to

materialized

table!

Since it coincided with China's National Day holiday and

everyone is

still

on
vacation, we couldn't reply promptly.

I haven't fully reviewed all the content in the FLIP

yet, but

there's an

important issue on the ALTER statement:

Unlike a regular CREATE TABLE, Materialized Table

derives its

schema

from

the defined query, columns are generated based on the

query

(and,

similar

to a
materialized view, the underlying data for these columns

is

tightly

coupled

to
the query definition). Therefore, we cannot simply

interpret

the

effect

of

an
single `ALTER MATERIALIZED TABLE ADD New_Column`

statement.

Supporting

this

likely requires accompanying column default value，and

raises

compatibility

concerns regarding historical data, that is a complex

topic

we

previously

discussed offline during the design process of FLIP-492.

Also, once Ron is back in the office, he may give a more

detailed

comment.



Best,
Lincoln Lee


Sergey Nuyanzin <[email protected]> 于2025年10月2日周四

20:15写道：

Thank you Ramin

In case there is no more feedback/objections
I would start voting thread next week

On Thu, Sep 25, 2025 at 10:43 AM Ramin Gharib <

[email protected]>

wrote:


Hi Sergey,
Thanks for driving this! This sounds good to me! +1

Cheers,

Ramin

On Wed, Sep 24, 2025 at 2:14 PM Sergey Nuyanzin <

[email protected]

wrote:

Hi everyone,
I'd like to start a discussion of FLIP-550
Add similar support for CREATE/ALTER operations for

MATERIALIZED

TABLEs as for TABLEs [1].

This FLIP is another step towards making tables and

materialized

tables more consistent. There was already one

improvement

in

that

direction like FLIP-542 [2] to add DISTRIBUTION and

SHOW

MATERIALIZED

TABLES support. However there were several more

things

noticed

comparing behavior for CREATE and ALTER

operations. For

instance

right

now for materialized tables it is impossible to set

anything

but

table

constraint while for tables (CREATE TABLE AS) it is

possible to

provide schema definition since FLIP-463 [3], also

ALTER

operations

for TABLE is a way more mature than for

MATERIALIZED

TABLE.

This

FLIP

is about to decrease the difference by enabling

more

similar

features

for materialized tables.

Introducing schema definition support for

materialized

tables

will

provide users with greater control and flexibility

and

also

will

unify

usage of tables and materialized tables.

[1]

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=387648095

[2]

https://cwiki.apache.org/confluence/display/FLINK/FLIP-542%3A+Make+materialized+table+DDL+consistent+with+regular+tables

[3]

https://cwiki.apache.org/confluence/display/FLINK/FLIP-463%3A+Schema+Definition+in+CREATE+TABLE+AS+Statement


--
Best regards,
Sergey




--
Best regards,
Sergey




--
Best regards,
Sergey




--
Best regards,
Sergey




--
Best regards,
Sergey




--
Best regards,
Sergey




--
Best regards,
Sergey

Re: [DISCUSS] FLIP-550: Add similar support for CREATE/ALTER operations for MATERIALIZED TABLEs as for TABLEs

Reply via email to