Hi Sergey,

+1 for this FLIP. But one last comment:

@Ron: I'm fine excluding adding/altering physical columns. But should we still allow the case where the data type of column should be declared or the column name or COMMENT?

E.g. take:

CREATE MATERIALIZED TABLE AS SELECT 'Hello';

By default it will result in (CHAR(5)). Shouldn't we allow users to define:


CREATE MATERIALIZED TABLE (name STRING) AS SELECT 'Hello';

We can forbid adding additional columns, but at least the one of the query could be better defined.

Also e.g.:

CREATE MATERIALIZED TABLE (name STRING COMMENT 'Comment of the user') AS SELECT 'Hello';

What do you think?

Cheers,
Timo





On 02.11.25 23:35, Sergey Nuyanzin wrote:
Great, thank you Ron!

In case there is no more feedback
I would suggest to move to voting[1] step

[1] https://lists.apache.org/thread/x7k41wtmp15wrcg7dqpb1f8tw1wstk0s

On Thu, Oct 30, 2025 at 2:34 AM Ron Liu <[email protected]> wrote:

Thanks for update, LGTM.

Best,
Ron

Sergey Nuyanzin <[email protected]> 于2025年10月30日周四 08:18写道:

Cool, thank you
updated FLIP
would be great if you have time to check it and tell if anything else
should be changed

On Wed, Oct 29, 2025 at 2:47 AM Ron Liu <[email protected]> wrote:

Hi, Sergey

Makes sense to me.

Best,
Ron

Sergey Nuyanzin <[email protected]> 于2025年10月29日周三 06:29写道:

Hi Ron,
thank you for your reply

I agree with the case for a compute column or metadata column, but
I
still
don't think physical columns should be added. I haven't seen a
real-world
case for it, so it shouldn't be supported with that syntax.

As mentioned above, I'm ok to exclude physical columns from this FLIP
and introduce a separate validation which will forbid them.

If this sounds ok, then I will update FLIP's page about that

On Tue, Oct 28, 2025 at 2:45 AM Ron Liu <[email protected]> wrote:

Hi, Sergey

Sorry for late reply.

About more realworld case: sometimes it is required to pass extra
information like e.g. headers with help of compute or metadata
columns.

We can add extra validation telling that physical columns are not
allowed to be added/modified/dropped.

However even with metadata/compute columns it will require rewriting
the query (which will be done as a part of the operation).
WDYT?

I agree with the case for a compute column or metadata column, but I
still
don't think physical columns should be added. I haven't seen a
real-world
case for it, so it shouldn't be supported with that syntax.


Best,
Ron


Sergey Nuyanzin <[email protected]> 于2025年10月21日周二 17:19写道:

Hi Ron,
sorry for the delay

About more realworld case: sometimes it is required to pass extra
information like e.g. headers with help of compute or metadata
columns.

We can add extra validation telling that physical columns are not
allowed to be added/modified/dropped.

However even with metadata/compute columns it will require
rewriting
the query (which will be done as a part of the operation).
WDYT?

A new question: regarding the operation ALTER MATERIALIZED TABLE
MyTable
ADD PARTITION (p1=1, p2='a') WITH ('k1'='v1'),  what is its
intended
semantics? Is it purely a metadata-only operation, or does it also
trigger
a job to refresh data for the new partition? I believe it should
be
the
latter—what’s your view?

yes it also should trigger job after that

On Fri, Oct 10, 2025 at 4:39 AM Ron Liu <[email protected]>
wrote:

Hi, Sergey.

Thanks for your quick response.

Probably one of the main use cases here is column name
reservation
for
the future.

I haven’t seen a concrete real-world business use case for this
feature.
My
concern is: if the sole motivation is to align syntactically with
CTAS
(Create Table As Select), what is its actual value and
significance?

Isn't it the same in the case of tables and pipelines bound
to
them?
I was thinking that since there is MaterializedTableManager, then
based on coming TableChangeOperation
it could decide then how to process such change: for example full
recompute or something more sophisticated

I believe this is not entirely equivalent to a regular table:

    - A materialized table consists of multiple components: table
metadata,
    pipeline, and data. The pipeline is an integral part of the
materialized
    table and is managed by it. We must ensure the stability and
consistency of
    all these components.
    - Unlike regular tables, the schema and data of a materialized
table
are
    derived from and continuously updated by its defining query.
Therefore,
    when adding, modifying, or dropping columns, the correct
approach
should be
    to first update the query, and let the query drive the schema
change.
    This is logically sound and delivers real business value. This
capability
    is already supported in FLIP-492[1] and FLIP-546 and can be
further
    extended.
    - Allowing column modifications via ALTER MATERIALIZED TABLE
    ADD/MODIFY/DROP COLUMN would cause a mismatch between the
materialized
    table’s query definition and its physical schema, leading to
inconsistency
    and significantly increasing operational and observability
costs.
Maybe
    also confuse the user.
    - Due to the special nature of materialized tables, metadata
    modifications must be handled with great caution. We should
not
allow
    arbitrary schema changes—for example, we cannot freely reorder
columns or
    change column types, and we may not even allow arbitrary
column
deletions,
    as these could break data compatibility. Moreover, if the
underlying
    physical storage doesn’t support such changes, the pipeline
may
fail
to
    run, and the MaterializedTableManager would be unable to
handle
it. In
    such cases, the best solution is for the user to explicitly
recreate
the
    materialized table. We must be cautious with user data—we
should
not
    silently rebuild the physical table or reprocess historical
data
on
the
    user’s behalf. Additionally, from a technical perspective, we
may
currently
    lack the capability to perform a full historical backfill in
    MaterializedTableManager, so such operations should be
explicitly
triggered
    by the user.

A new question: regarding the operation ALTER MATERIALIZED TABLE
MyTable
ADD PARTITION (p1=1, p2='a') WITH ('k1'='v1'),  what is its
intended
semantics? Is it purely a metadata-only operation, or does it
also
trigger
a job to refresh data for the new partition? I believe it should
be
the
latter—what’s your view?


With respect to the operations proposed in the FLIP, I think we
can
support
those that only affect metadata and do not impact the
materialized
table’s
query logic or data update behavior. The following operations are
acceptable:


    - Support defining watermarks when creating a MATERIALIZED
TABLE.
    - Support specifying a column_list when creating a
MATERIALIZED
TABLE.
    - Support ALTER MATERIALIZED TABLE to add watermarks, primary
keys, or
    partitions.
    - Support ALTER MATERIALIZED TABLE to drop watermarks, primary
keys,
or
    partitions.

For all other operations that would affect data updates or
require
query
rewriting, I remain cautious and reserved.


1.



https://cwiki.apache.org/confluence/display/FLINK/FLIP-492%3A+Support+Query+Modifications+for+Materialized+Tables

2.



https://cwiki.apache.org/confluence/display/FLINK/FLIP-546%3A+Introduce+CREATE+OR+ALTER+for+Materialized+Tables

Best,
Ron

Sergey Nuyanzin <[email protected]> 于2025年10月9日周四 18:38写道:

Hi Ron,

thank you for the feedback

When adding new columns via CREATE or ALTER that are not
included
in
the
defining query of the Materialized Table—who is responsible for
updating
the data in these new columns?

when we detect some new columns which are not present in query,
then
1) validate that the type of each of them is nullable
(otherwise
throw
ValidationException)
2) merge them into schema
3) rewrite materializedTable query in a way that now this query
fills
newly added columns with nulls.
It means that newly rewritten query will be responsible for
filling
these null values
The approach is similar to the way CTAS behaves in the same
situation.

Probably one of the main use cases here is column name
reservation
for
the future.

For ALTER MATERIALIZED TABLE operations like ADD/MODIFY/DROP
columns—these changes could cause the pipeline bound to the
Materialized
Table to fail.

Isn't it the same in the case of tables and pipelines bound to
them?
I was thinking that since there is MaterializedTableManager,
then
based on coming TableChangeOperation
it could decide then how to process such change: for example
full
recompute or something more sophisticated

Looking forward for your comments

On Thu, Oct 9, 2025 at 11:13 AM Ron Liu <[email protected]>
wrote:

Hi, Sergey.

I was on vacation recently, so sorry for joining this
discussion
so
late.

I’ve carefully reviewed the FLIP, and purely from the
perspective of
aligning Materialized Table operations with those of a
regular
Table, I
support this proposal in principle. However, in my
understanding,
Materialized Tables and regular Tables are fundamentally
different. A
Materialized Table is bound to a specific pipeline that
updates
its
data—this pipeline is generated from the associated query. In
contrast, a
regular Table isn’t tied to any pipeline; users manually
write
queries to
update its data. Performing an ALTER operation on a regular
Table
only
modifies metadata, whereas performing ALTER on a Materialized
Table
affects
not only metadata but also the underlying data update
mechanism.

Given this context, I have the following questions:
1. When adding new columns via CREATE or ALTER that are not
included
in
the
defining query of the Materialized Table—who is responsible
for
updating
the data in these new columns? I’m unclear about the purpose
and
use
case
for adding such columns. Could you provide a concrete
example?
2. For ALTER MATERIALIZED TABLE operations like
ADD/MODIFY/DROP
columns—these changes could cause the pipeline bound to the
Materialized
Table to fail. What is the exact execution flow for these
operations?
Could
you elaborate on the runtime behavior for each type of
operation?
Since
these actions impact actual data updates—not just
metadata—this
is a
critical concern.
In summary, I believe we shouldn’t blindly apply all regular
Table
operations directly to Materialized Tables. Instead, we
should
selectively
support a subset of operations based on real-world usage
scenarios
and
semantic correctness. What’s your take on this? Best, Ron

Sergey Nuyanzin <[email protected]> 于2025年10月8日周三 08:05写道:

Hi Lincoln,

Thank you for your feedback.

I guess we already have similar behavior for CTAS, where we
could
put
more columns than we have for query.
In this case these extra columns should be filled with
nulls,
and
the
query should be rewritten accordingly [1].
This also means that extra columns should have nullable
type
(there is
a dedicated validation for this).
It means that for non query columns we have such default
values and
query is rewritten taking them into account

Regarding adding columns with alter, or some other changes
like
adding/dropping columns, constraints, distribution
if I understand correctly MaterializedTableManager looking
at
table
change can decide whether it should recompute materialized
table or
not

Would it make sense?

[1]




https://github.com/apache/flink/blob/3478ddf08bce49e271f69b922a37ccada6f58688/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/operations/converters/table/SqlCreateTableAsConverter.java#L66-L74


On Tue, Oct 7, 2025 at 4:14 AM Lincoln Lee <
[email protected]

wrote:

Thanks Sergey for driving this FLIP, it's a great
addition to
materialized
table!

Since it coincided with China's National Day holiday and
everyone is
still
on
vacation, we couldn't reply promptly.

I haven't fully reviewed all the content in the FLIP
yet, but
there's an
important issue on the ALTER statement:

Unlike a regular CREATE TABLE, Materialized Table
derives its
schema
from
the defined query, columns are generated based on the
query
(and,
similar
to a
materialized view, the underlying data for these columns
is
tightly
coupled
to
the query definition). Therefore, we cannot simply
interpret
the
effect
of
an
single `ALTER MATERIALIZED TABLE ADD New_Column`
statement.
Supporting
this
likely requires accompanying column default value,and
raises
compatibility
concerns regarding historical data, that is a complex
topic
we
previously
discussed offline during the design process of FLIP-492.

Also, once Ron is back in the office, he may give a more
detailed
comment.


Best,
Lincoln Lee


Sergey Nuyanzin <[email protected]> 于2025年10月2日周四
20:15写道:

Thank you Ramin

In case there is no more feedback/objections
I would start voting thread next week

On Thu, Sep 25, 2025 at 10:43 AM Ramin Gharib <
[email protected]>
wrote:

Hi Sergey,
Thanks for driving this! This sounds good to me! +1

Cheers,

Ramin

On Wed, Sep 24, 2025 at 2:14 PM Sergey Nuyanzin <
[email protected]

wrote:

Hi everyone,
I'd like to start a discussion of FLIP-550
Add similar support for CREATE/ALTER operations for
MATERIALIZED
TABLEs as for TABLEs [1].

This FLIP is another step towards making tables and
materialized
tables more consistent. There was already one
improvement
in
that
direction like FLIP-542 [2] to add DISTRIBUTION and
SHOW
MATERIALIZED
TABLES support. However there were several more
things
noticed
comparing behavior for CREATE and ALTER
operations. For
instance
right
now for materialized tables it is impossible to set
anything
but
table
constraint while for tables (CREATE TABLE AS) it is
possible to
provide schema definition since FLIP-463 [3], also
ALTER
operations
for TABLE is a way more mature than for
MATERIALIZED
TABLE.
This
FLIP
is about to decrease the difference by enabling
more
similar
features
for materialized tables.

Introducing schema definition support for
materialized
tables
will
provide users with greater control and flexibility
and
also
will
unify
usage of tables and materialized tables.

[1]






https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=387648095

[2]






https://cwiki.apache.org/confluence/display/FLINK/FLIP-542%3A+Make+materialized+table+DDL+consistent+with+regular+tables

[3]






https://cwiki.apache.org/confluence/display/FLINK/FLIP-463%3A+Schema+Definition+in+CREATE+TABLE+AS+Statement

--
Best regards,
Sergey




--
Best regards,
Sergey




--
Best regards,
Sergey




--
Best regards,
Sergey




--
Best regards,
Sergey




--
Best regards,
Sergey




--
Best regards,
Sergey





Reply via email to