Hi, Sergey Makes sense to me.
Best, Ron Sergey Nuyanzin <[email protected]> 于2025年10月29日周三 06:29写道: > Hi Ron, > thank you for your reply > > >>> I agree with the case for a compute column or metadata column, but I > still > don't think physical columns should be added. I haven't seen a real-world > case for it, so it shouldn't be supported with that syntax. > > As mentioned above, I'm ok to exclude physical columns from this FLIP > and introduce a separate validation which will forbid them. > > If this sounds ok, then I will update FLIP's page about that > > On Tue, Oct 28, 2025 at 2:45 AM Ron Liu <[email protected]> wrote: > > > > Hi, Sergey > > > > Sorry for late reply. > > > > >>> About more realworld case: sometimes it is required to pass extra > > information like e.g. headers with help of compute or metadata > > columns. > > > > We can add extra validation telling that physical columns are not > > allowed to be added/modified/dropped. > > > > However even with metadata/compute columns it will require rewriting > > the query (which will be done as a part of the operation). > > WDYT? > > > > I agree with the case for a compute column or metadata column, but I > still > > don't think physical columns should be added. I haven't seen a real-world > > case for it, so it shouldn't be supported with that syntax. > > > > > > Best, > > Ron > > > > > > Sergey Nuyanzin <[email protected]> 于2025年10月21日周二 17:19写道: > > > > > Hi Ron, > > > sorry for the delay > > > > > > About more realworld case: sometimes it is required to pass extra > > > information like e.g. headers with help of compute or metadata > > > columns. > > > > > > We can add extra validation telling that physical columns are not > > > allowed to be added/modified/dropped. > > > > > > However even with metadata/compute columns it will require rewriting > > > the query (which will be done as a part of the operation). > > > WDYT? > > > > > > >A new question: regarding the operation ALTER MATERIALIZED TABLE > MyTable > > > >ADD PARTITION (p1=1, p2='a') WITH ('k1'='v1'), what is its intended > > > >semantics? Is it purely a metadata-only operation, or does it also > trigger > > > >a job to refresh data for the new partition? I believe it should be > the > > > >latter—what’s your view? > > > > > > yes it also should trigger job after that > > > > > > On Fri, Oct 10, 2025 at 4:39 AM Ron Liu <[email protected]> wrote: > > > > > > > > Hi, Sergey. > > > > > > > > Thanks for your quick response. > > > > > > > > >>> Probably one of the main use cases here is column name > reservation > > > for > > > > the future. > > > > > > > > I haven’t seen a concrete real-world business use case for this > feature. > > > My > > > > concern is: if the sole motivation is to align syntactically with > CTAS > > > > (Create Table As Select), what is its actual value and significance? > > > > > > > > >>> Isn't it the same in the case of tables and pipelines bound to > them? > > > > I was thinking that since there is MaterializedTableManager, then > > > > based on coming TableChangeOperation > > > > it could decide then how to process such change: for example full > > > > recompute or something more sophisticated > > > > > > > > I believe this is not entirely equivalent to a regular table: > > > > > > > > - A materialized table consists of multiple components: table > > > metadata, > > > > pipeline, and data. The pipeline is an integral part of the > > > materialized > > > > table and is managed by it. We must ensure the stability and > > > consistency of > > > > all these components. > > > > - Unlike regular tables, the schema and data of a materialized > table > > > are > > > > derived from and continuously updated by its defining query. > > > Therefore, > > > > when adding, modifying, or dropping columns, the correct approach > > > should be > > > > to first update the query, and let the query drive the schema > change. > > > > This is logically sound and delivers real business value. This > > > capability > > > > is already supported in FLIP-492[1] and FLIP-546 and can be > further > > > > extended. > > > > - Allowing column modifications via ALTER MATERIALIZED TABLE > > > > ADD/MODIFY/DROP COLUMN would cause a mismatch between the > materialized > > > > table’s query definition and its physical schema, leading to > > > inconsistency > > > > and significantly increasing operational and observability costs. > > > Maybe > > > > also confuse the user. > > > > - Due to the special nature of materialized tables, metadata > > > > modifications must be handled with great caution. We should not > allow > > > > arbitrary schema changes—for example, we cannot freely reorder > > > columns or > > > > change column types, and we may not even allow arbitrary column > > > deletions, > > > > as these could break data compatibility. Moreover, if the > underlying > > > > physical storage doesn’t support such changes, the pipeline may > fail > > > to > > > > run, and the MaterializedTableManager would be unable to handle > it. In > > > > such cases, the best solution is for the user to explicitly > recreate > > > the > > > > materialized table. We must be cautious with user data—we should > not > > > > silently rebuild the physical table or reprocess historical data > on > > > the > > > > user’s behalf. Additionally, from a technical perspective, we may > > > currently > > > > lack the capability to perform a full historical backfill in > > > > MaterializedTableManager, so such operations should be explicitly > > > triggered > > > > by the user. > > > > > > > > A new question: regarding the operation ALTER MATERIALIZED TABLE > MyTable > > > > ADD PARTITION (p1=1, p2='a') WITH ('k1'='v1'), what is its intended > > > > semantics? Is it purely a metadata-only operation, or does it also > > > trigger > > > > a job to refresh data for the new partition? I believe it should be > the > > > > latter—what’s your view? > > > > > > > > > > > > With respect to the operations proposed in the FLIP, I think we can > > > support > > > > those that only affect metadata and do not impact the materialized > > > table’s > > > > query logic or data update behavior. The following operations are > > > > acceptable: > > > > > > > > > > > > - Support defining watermarks when creating a MATERIALIZED TABLE. > > > > - Support specifying a column_list when creating a MATERIALIZED > TABLE. > > > > - Support ALTER MATERIALIZED TABLE to add watermarks, primary > keys, or > > > > partitions. > > > > - Support ALTER MATERIALIZED TABLE to drop watermarks, primary > keys, > > > or > > > > partitions. > > > > > > > > For all other operations that would affect data updates or require > query > > > > rewriting, I remain cautious and reserved. > > > > > > > > > > > > 1. > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-492%3A+Support+Query+Modifications+for+Materialized+Tables > > > > > > > > 2. > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-546%3A+Introduce+CREATE+OR+ALTER+for+Materialized+Tables > > > > > > > > Best, > > > > Ron > > > > > > > > Sergey Nuyanzin <[email protected]> 于2025年10月9日周四 18:38写道: > > > > > > > > > Hi Ron, > > > > > > > > > > thank you for the feedback > > > > > > > > > > >When adding new columns via CREATE or ALTER that are not included > in > > > the > > > > > defining query of the Materialized Table—who is responsible for > > > updating > > > > > the data in these new columns? > > > > > > > > > > when we detect some new columns which are not present in query, > > > > > then > > > > > 1) validate that the type of each of them is nullable (otherwise > throw > > > > > ValidationException) > > > > > 2) merge them into schema > > > > > 3) rewrite materializedTable query in a way that now this query > fills > > > > > newly added columns with nulls. > > > > > It means that newly rewritten query will be responsible for filling > > > > > these null values > > > > > The approach is similar to the way CTAS behaves in the same > situation. > > > > > > > > > > Probably one of the main use cases here is column name reservation > for > > > > > the future. > > > > > > > > > > >For ALTER MATERIALIZED TABLE operations like ADD/MODIFY/DROP > > > > > columns—these changes could cause the pipeline bound to the > > > Materialized > > > > > Table to fail. > > > > > > > > > > Isn't it the same in the case of tables and pipelines bound to > them? > > > > > I was thinking that since there is MaterializedTableManager, then > > > > > based on coming TableChangeOperation > > > > > it could decide then how to process such change: for example full > > > > > recompute or something more sophisticated > > > > > > > > > > Looking forward for your comments > > > > > > > > > > On Thu, Oct 9, 2025 at 11:13 AM Ron Liu <[email protected]> > wrote: > > > > > > > > > > > > Hi, Sergey. > > > > > > > > > > > > I was on vacation recently, so sorry for joining this discussion > so > > > late. > > > > > > > > > > > > I’ve carefully reviewed the FLIP, and purely from the > perspective of > > > > > > aligning Materialized Table operations with those of a regular > > > Table, I > > > > > > support this proposal in principle. However, in my understanding, > > > > > > Materialized Tables and regular Tables are fundamentally > different. A > > > > > > Materialized Table is bound to a specific pipeline that updates > its > > > > > > data—this pipeline is generated from the associated query. In > > > contrast, a > > > > > > regular Table isn’t tied to any pipeline; users manually write > > > queries to > > > > > > update its data. Performing an ALTER operation on a regular Table > > > only > > > > > > modifies metadata, whereas performing ALTER on a Materialized > Table > > > > > affects > > > > > > not only metadata but also the underlying data update mechanism. > > > > > > > > > > > > Given this context, I have the following questions: > > > > > > 1. When adding new columns via CREATE or ALTER that are not > included > > > in > > > > > the > > > > > > defining query of the Materialized Table—who is responsible for > > > updating > > > > > > the data in these new columns? I’m unclear about the purpose and > use > > > case > > > > > > for adding such columns. Could you provide a concrete example? > > > > > > 2. For ALTER MATERIALIZED TABLE operations like ADD/MODIFY/DROP > > > > > > columns—these changes could cause the pipeline bound to the > > > Materialized > > > > > > Table to fail. What is the exact execution flow for these > operations? > > > > > Could > > > > > > you elaborate on the runtime behavior for each type of operation? > > > Since > > > > > > these actions impact actual data updates—not just metadata—this > is a > > > > > > critical concern. > > > > > > In summary, I believe we shouldn’t blindly apply all regular > Table > > > > > > operations directly to Materialized Tables. Instead, we should > > > > > selectively > > > > > > support a subset of operations based on real-world usage > scenarios > > > and > > > > > > semantic correctness. What’s your take on this? Best, Ron > > > > > > > > > > > > Sergey Nuyanzin <[email protected]> 于2025年10月8日周三 08:05写道: > > > > > > > > > > > > > Hi Lincoln, > > > > > > > > > > > > > > Thank you for your feedback. > > > > > > > > > > > > > > I guess we already have similar behavior for CTAS, where we > could > > > put > > > > > > > more columns than we have for query. > > > > > > > In this case these extra columns should be filled with nulls, > and > > > the > > > > > > > query should be rewritten accordingly [1]. > > > > > > > This also means that extra columns should have nullable type > > > (there is > > > > > > > a dedicated validation for this). > > > > > > > It means that for non query columns we have such default > values and > > > > > > > query is rewritten taking them into account > > > > > > > > > > > > > > Regarding adding columns with alter, or some other changes like > > > > > > > adding/dropping columns, constraints, distribution > > > > > > > if I understand correctly MaterializedTableManager looking at > table > > > > > > > change can decide whether it should recompute materialized > table or > > > > > > > not > > > > > > > > > > > > > > Would it make sense? > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > https://github.com/apache/flink/blob/3478ddf08bce49e271f69b922a37ccada6f58688/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/operations/converters/table/SqlCreateTableAsConverter.java#L66-L74 > > > > > > > > > > > > > > > > > > > > > On Tue, Oct 7, 2025 at 4:14 AM Lincoln Lee < > [email protected] > > > > > > > > > wrote: > > > > > > > > > > > > > > > > Thanks Sergey for driving this FLIP, it's a great addition to > > > > > > > materialized > > > > > > > > table! > > > > > > > > > > > > > > > > Since it coincided with China's National Day holiday and > > > everyone is > > > > > > > still > > > > > > > > on > > > > > > > > vacation, we couldn't reply promptly. > > > > > > > > > > > > > > > > I haven't fully reviewed all the content in the FLIP yet, but > > > > > there's an > > > > > > > > important issue on the ALTER statement: > > > > > > > > > > > > > > > > Unlike a regular CREATE TABLE, Materialized Table derives its > > > schema > > > > > from > > > > > > > > the defined query, columns are generated based on the query > (and, > > > > > similar > > > > > > > > to a > > > > > > > > materialized view, the underlying data for these columns is > > > tightly > > > > > > > coupled > > > > > > > > to > > > > > > > > the query definition). Therefore, we cannot simply interpret > the > > > > > effect > > > > > > > of > > > > > > > > an > > > > > > > > single `ALTER MATERIALIZED TABLE ADD New_Column` statement. > > > > > Supporting > > > > > > > this > > > > > > > > likely requires accompanying column default value,and raises > > > > > > > compatibility > > > > > > > > concerns regarding historical data, that is a complex topic > we > > > > > previously > > > > > > > > discussed offline during the design process of FLIP-492. > > > > > > > > > > > > > > > > Also, once Ron is back in the office, he may give a more > detailed > > > > > > > comment. > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > Lincoln Lee > > > > > > > > > > > > > > > > > > > > > > > > Sergey Nuyanzin <[email protected]> 于2025年10月2日周四 20:15写道: > > > > > > > > > > > > > > > > > Thank you Ramin > > > > > > > > > > > > > > > > > > In case there is no more feedback/objections > > > > > > > > > I would start voting thread next week > > > > > > > > > > > > > > > > > > On Thu, Sep 25, 2025 at 10:43 AM Ramin Gharib < > > > > > [email protected]> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > Hi Sergey, > > > > > > > > > > Thanks for driving this! This sounds good to me! +1 > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > > > > > > > Ramin > > > > > > > > > > > > > > > > > > > > On Wed, Sep 24, 2025 at 2:14 PM Sergey Nuyanzin < > > > > > [email protected] > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > I'd like to start a discussion of FLIP-550 > > > > > > > > > > > Add similar support for CREATE/ALTER operations for > > > > > MATERIALIZED > > > > > > > > > > > TABLEs as for TABLEs [1]. > > > > > > > > > > > > > > > > > > > > > > This FLIP is another step towards making tables and > > > > > materialized > > > > > > > > > > > tables more consistent. There was already one > improvement > > > in > > > > > that > > > > > > > > > > > direction like FLIP-542 [2] to add DISTRIBUTION and > SHOW > > > > > > > MATERIALIZED > > > > > > > > > > > TABLES support. However there were several more things > > > noticed > > > > > > > > > > > comparing behavior for CREATE and ALTER operations. For > > > > > instance > > > > > > > right > > > > > > > > > > > now for materialized tables it is impossible to set > > > anything > > > > > but > > > > > > > table > > > > > > > > > > > constraint while for tables (CREATE TABLE AS) it is > > > possible to > > > > > > > > > > > provide schema definition since FLIP-463 [3], also > ALTER > > > > > operations > > > > > > > > > > > for TABLE is a way more mature than for MATERIALIZED > TABLE. > > > > > This > > > > > > > FLIP > > > > > > > > > > > is about to decrease the difference by enabling more > > > similar > > > > > > > features > > > > > > > > > > > for materialized tables. > > > > > > > > > > > > > > > > > > > > > > Introducing schema definition support for materialized > > > tables > > > > > will > > > > > > > > > > > provide users with greater control and flexibility and > also > > > > > will > > > > > > > unify > > > > > > > > > > > usage of tables and materialized tables. > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=387648095 > > > > > > > > > > > > > > > > > > > > > > [2] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-542%3A+Make+materialized+table+DDL+consistent+with+regular+tables > > > > > > > > > > > > > > > > > > > > > > [3] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-463%3A+Schema+Definition+in+CREATE+TABLE+AS+Statement > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > Best regards, > > > > > > > > > > > Sergey > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > Best regards, > > > > > > > > > Sergey > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Best regards, > > > > > > > Sergey > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Best regards, > > > > > Sergey > > > > > > > > > > > > > > > > > -- > > > Best regards, > > > Sergey > > > > > > > -- > Best regards, > Sergey >
