Hi, Sergey Sorry for late reply.
>>> About more realworld case: sometimes it is required to pass extra information like e.g. headers with help of compute or metadata columns. We can add extra validation telling that physical columns are not allowed to be added/modified/dropped. However even with metadata/compute columns it will require rewriting the query (which will be done as a part of the operation). WDYT? I agree with the case for a compute column or metadata column, but I still don't think physical columns should be added. I haven't seen a real-world case for it, so it shouldn't be supported with that syntax. Best, Ron Sergey Nuyanzin <[email protected]> 于2025年10月21日周二 17:19写道: > Hi Ron, > sorry for the delay > > About more realworld case: sometimes it is required to pass extra > information like e.g. headers with help of compute or metadata > columns. > > We can add extra validation telling that physical columns are not > allowed to be added/modified/dropped. > > However even with metadata/compute columns it will require rewriting > the query (which will be done as a part of the operation). > WDYT? > > >A new question: regarding the operation ALTER MATERIALIZED TABLE MyTable > >ADD PARTITION (p1=1, p2='a') WITH ('k1'='v1'), what is its intended > >semantics? Is it purely a metadata-only operation, or does it also trigger > >a job to refresh data for the new partition? I believe it should be the > >latter—what’s your view? > > yes it also should trigger job after that > > On Fri, Oct 10, 2025 at 4:39 AM Ron Liu <[email protected]> wrote: > > > > Hi, Sergey. > > > > Thanks for your quick response. > > > > >>> Probably one of the main use cases here is column name reservation > for > > the future. > > > > I haven’t seen a concrete real-world business use case for this feature. > My > > concern is: if the sole motivation is to align syntactically with CTAS > > (Create Table As Select), what is its actual value and significance? > > > > >>> Isn't it the same in the case of tables and pipelines bound to them? > > I was thinking that since there is MaterializedTableManager, then > > based on coming TableChangeOperation > > it could decide then how to process such change: for example full > > recompute or something more sophisticated > > > > I believe this is not entirely equivalent to a regular table: > > > > - A materialized table consists of multiple components: table > metadata, > > pipeline, and data. The pipeline is an integral part of the > materialized > > table and is managed by it. We must ensure the stability and > consistency of > > all these components. > > - Unlike regular tables, the schema and data of a materialized table > are > > derived from and continuously updated by its defining query. > Therefore, > > when adding, modifying, or dropping columns, the correct approach > should be > > to first update the query, and let the query drive the schema change. > > This is logically sound and delivers real business value. This > capability > > is already supported in FLIP-492[1] and FLIP-546 and can be further > > extended. > > - Allowing column modifications via ALTER MATERIALIZED TABLE > > ADD/MODIFY/DROP COLUMN would cause a mismatch between the materialized > > table’s query definition and its physical schema, leading to > inconsistency > > and significantly increasing operational and observability costs. > Maybe > > also confuse the user. > > - Due to the special nature of materialized tables, metadata > > modifications must be handled with great caution. We should not allow > > arbitrary schema changes—for example, we cannot freely reorder > columns or > > change column types, and we may not even allow arbitrary column > deletions, > > as these could break data compatibility. Moreover, if the underlying > > physical storage doesn’t support such changes, the pipeline may fail > to > > run, and the MaterializedTableManager would be unable to handle it. In > > such cases, the best solution is for the user to explicitly recreate > the > > materialized table. We must be cautious with user data—we should not > > silently rebuild the physical table or reprocess historical data on > the > > user’s behalf. Additionally, from a technical perspective, we may > currently > > lack the capability to perform a full historical backfill in > > MaterializedTableManager, so such operations should be explicitly > triggered > > by the user. > > > > A new question: regarding the operation ALTER MATERIALIZED TABLE MyTable > > ADD PARTITION (p1=1, p2='a') WITH ('k1'='v1'), what is its intended > > semantics? Is it purely a metadata-only operation, or does it also > trigger > > a job to refresh data for the new partition? I believe it should be the > > latter—what’s your view? > > > > > > With respect to the operations proposed in the FLIP, I think we can > support > > those that only affect metadata and do not impact the materialized > table’s > > query logic or data update behavior. The following operations are > > acceptable: > > > > > > - Support defining watermarks when creating a MATERIALIZED TABLE. > > - Support specifying a column_list when creating a MATERIALIZED TABLE. > > - Support ALTER MATERIALIZED TABLE to add watermarks, primary keys, or > > partitions. > > - Support ALTER MATERIALIZED TABLE to drop watermarks, primary keys, > or > > partitions. > > > > For all other operations that would affect data updates or require query > > rewriting, I remain cautious and reserved. > > > > > > 1. > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-492%3A+Support+Query+Modifications+for+Materialized+Tables > > > > 2. > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-546%3A+Introduce+CREATE+OR+ALTER+for+Materialized+Tables > > > > Best, > > Ron > > > > Sergey Nuyanzin <[email protected]> 于2025年10月9日周四 18:38写道: > > > > > Hi Ron, > > > > > > thank you for the feedback > > > > > > >When adding new columns via CREATE or ALTER that are not included in > the > > > defining query of the Materialized Table—who is responsible for > updating > > > the data in these new columns? > > > > > > when we detect some new columns which are not present in query, > > > then > > > 1) validate that the type of each of them is nullable (otherwise throw > > > ValidationException) > > > 2) merge them into schema > > > 3) rewrite materializedTable query in a way that now this query fills > > > newly added columns with nulls. > > > It means that newly rewritten query will be responsible for filling > > > these null values > > > The approach is similar to the way CTAS behaves in the same situation. > > > > > > Probably one of the main use cases here is column name reservation for > > > the future. > > > > > > >For ALTER MATERIALIZED TABLE operations like ADD/MODIFY/DROP > > > columns—these changes could cause the pipeline bound to the > Materialized > > > Table to fail. > > > > > > Isn't it the same in the case of tables and pipelines bound to them? > > > I was thinking that since there is MaterializedTableManager, then > > > based on coming TableChangeOperation > > > it could decide then how to process such change: for example full > > > recompute or something more sophisticated > > > > > > Looking forward for your comments > > > > > > On Thu, Oct 9, 2025 at 11:13 AM Ron Liu <[email protected]> wrote: > > > > > > > > Hi, Sergey. > > > > > > > > I was on vacation recently, so sorry for joining this discussion so > late. > > > > > > > > I’ve carefully reviewed the FLIP, and purely from the perspective of > > > > aligning Materialized Table operations with those of a regular > Table, I > > > > support this proposal in principle. However, in my understanding, > > > > Materialized Tables and regular Tables are fundamentally different. A > > > > Materialized Table is bound to a specific pipeline that updates its > > > > data—this pipeline is generated from the associated query. In > contrast, a > > > > regular Table isn’t tied to any pipeline; users manually write > queries to > > > > update its data. Performing an ALTER operation on a regular Table > only > > > > modifies metadata, whereas performing ALTER on a Materialized Table > > > affects > > > > not only metadata but also the underlying data update mechanism. > > > > > > > > Given this context, I have the following questions: > > > > 1. When adding new columns via CREATE or ALTER that are not included > in > > > the > > > > defining query of the Materialized Table—who is responsible for > updating > > > > the data in these new columns? I’m unclear about the purpose and use > case > > > > for adding such columns. Could you provide a concrete example? > > > > 2. For ALTER MATERIALIZED TABLE operations like ADD/MODIFY/DROP > > > > columns—these changes could cause the pipeline bound to the > Materialized > > > > Table to fail. What is the exact execution flow for these operations? > > > Could > > > > you elaborate on the runtime behavior for each type of operation? > Since > > > > these actions impact actual data updates—not just metadata—this is a > > > > critical concern. > > > > In summary, I believe we shouldn’t blindly apply all regular Table > > > > operations directly to Materialized Tables. Instead, we should > > > selectively > > > > support a subset of operations based on real-world usage scenarios > and > > > > semantic correctness. What’s your take on this? Best, Ron > > > > > > > > Sergey Nuyanzin <[email protected]> 于2025年10月8日周三 08:05写道: > > > > > > > > > Hi Lincoln, > > > > > > > > > > Thank you for your feedback. > > > > > > > > > > I guess we already have similar behavior for CTAS, where we could > put > > > > > more columns than we have for query. > > > > > In this case these extra columns should be filled with nulls, and > the > > > > > query should be rewritten accordingly [1]. > > > > > This also means that extra columns should have nullable type > (there is > > > > > a dedicated validation for this). > > > > > It means that for non query columns we have such default values and > > > > > query is rewritten taking them into account > > > > > > > > > > Regarding adding columns with alter, or some other changes like > > > > > adding/dropping columns, constraints, distribution > > > > > if I understand correctly MaterializedTableManager looking at table > > > > > change can decide whether it should recompute materialized table or > > > > > not > > > > > > > > > > Would it make sense? > > > > > > > > > > [1] > > > > > > > > > https://github.com/apache/flink/blob/3478ddf08bce49e271f69b922a37ccada6f58688/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/operations/converters/table/SqlCreateTableAsConverter.java#L66-L74 > > > > > > > > > > > > > > > On Tue, Oct 7, 2025 at 4:14 AM Lincoln Lee <[email protected] > > > > > wrote: > > > > > > > > > > > > Thanks Sergey for driving this FLIP, it's a great addition to > > > > > materialized > > > > > > table! > > > > > > > > > > > > Since it coincided with China's National Day holiday and > everyone is > > > > > still > > > > > > on > > > > > > vacation, we couldn't reply promptly. > > > > > > > > > > > > I haven't fully reviewed all the content in the FLIP yet, but > > > there's an > > > > > > important issue on the ALTER statement: > > > > > > > > > > > > Unlike a regular CREATE TABLE, Materialized Table derives its > schema > > > from > > > > > > the defined query, columns are generated based on the query (and, > > > similar > > > > > > to a > > > > > > materialized view, the underlying data for these columns is > tightly > > > > > coupled > > > > > > to > > > > > > the query definition). Therefore, we cannot simply interpret the > > > effect > > > > > of > > > > > > an > > > > > > single `ALTER MATERIALIZED TABLE ADD New_Column` statement. > > > Supporting > > > > > this > > > > > > likely requires accompanying column default value,and raises > > > > > compatibility > > > > > > concerns regarding historical data, that is a complex topic we > > > previously > > > > > > discussed offline during the design process of FLIP-492. > > > > > > > > > > > > Also, once Ron is back in the office, he may give a more detailed > > > > > comment. > > > > > > > > > > > > > > > > > > Best, > > > > > > Lincoln Lee > > > > > > > > > > > > > > > > > > Sergey Nuyanzin <[email protected]> 于2025年10月2日周四 20:15写道: > > > > > > > > > > > > > Thank you Ramin > > > > > > > > > > > > > > In case there is no more feedback/objections > > > > > > > I would start voting thread next week > > > > > > > > > > > > > > On Thu, Sep 25, 2025 at 10:43 AM Ramin Gharib < > > > [email protected]> > > > > > > > wrote: > > > > > > > > > > > > > > > > Hi Sergey, > > > > > > > > Thanks for driving this! This sounds good to me! +1 > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > > > Ramin > > > > > > > > > > > > > > > > On Wed, Sep 24, 2025 at 2:14 PM Sergey Nuyanzin < > > > [email protected] > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > I'd like to start a discussion of FLIP-550 > > > > > > > > > Add similar support for CREATE/ALTER operations for > > > MATERIALIZED > > > > > > > > > TABLEs as for TABLEs [1]. > > > > > > > > > > > > > > > > > > This FLIP is another step towards making tables and > > > materialized > > > > > > > > > tables more consistent. There was already one improvement > in > > > that > > > > > > > > > direction like FLIP-542 [2] to add DISTRIBUTION and SHOW > > > > > MATERIALIZED > > > > > > > > > TABLES support. However there were several more things > noticed > > > > > > > > > comparing behavior for CREATE and ALTER operations. For > > > instance > > > > > right > > > > > > > > > now for materialized tables it is impossible to set > anything > > > but > > > > > table > > > > > > > > > constraint while for tables (CREATE TABLE AS) it is > possible to > > > > > > > > > provide schema definition since FLIP-463 [3], also ALTER > > > operations > > > > > > > > > for TABLE is a way more mature than for MATERIALIZED TABLE. > > > This > > > > > FLIP > > > > > > > > > is about to decrease the difference by enabling more > similar > > > > > features > > > > > > > > > for materialized tables. > > > > > > > > > > > > > > > > > > Introducing schema definition support for materialized > tables > > > will > > > > > > > > > provide users with greater control and flexibility and also > > > will > > > > > unify > > > > > > > > > usage of tables and materialized tables. > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=387648095 > > > > > > > > > > > > > > > > > > [2] > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-542%3A+Make+materialized+table+DDL+consistent+with+regular+tables > > > > > > > > > > > > > > > > > > [3] > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-463%3A+Schema+Definition+in+CREATE+TABLE+AS+Statement > > > > > > > > > > > > > > > > > > -- > > > > > > > > > Best regards, > > > > > > > > > Sergey > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Best regards, > > > > > > > Sergey > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Best regards, > > > > > Sergey > > > > > > > > > > > > > > > > > -- > > > Best regards, > > > Sergey > > > > > > > -- > Best regards, > Sergey >
