Hi, xia > which I think if Flink supports table cache in framework-level, we can also recache in framework-level for truncate table statement.
I think currently flink catalog already will some stats for the table, eg: after `ANALYZE TABLE`, the table's Statistics will be stored in the catalog, but truncate table will not correct the statistic. I know it's hard for Flink to do the unified follow-up actions after truncating table. But I think we need define a clear location for the Flink Catalog in mind. IMO, Flink as a compute engine, it's hard for it to maintain the catalog for different storage table itself. So with more and more `Executable` command introduced the data in catalog will be cleaved. In this case, after truncate the catalog's following part may be affected: - the table/column statistic will be not correct - the partition of this table should be cleared Best, Aitozi. liu ron <ron9....@gmail.com> 于2023年4月13日周四 11:28写道: > > Hi, xia > > Thanks for your explanation, for the first question, given the current > status, I think we can provide the generic interface in the future if we > need it. For the second question, it makes sense to me if we can > support the table cache at the framework level. > > Best, > Ron > > yuxia <luoyu...@alumni.sjtu.edu.cn> 于2023年4月11日周二 16:12写道: > > > Hi, ron. > > > > 1: Considering for deleting rows, Flink will also write delete record to > > achive purpose of deleting data, it may not as so strange for connector > > devs to make DynamicTableSink implement SupportsTruncate to support > > truncate the table. Based on the assume that DynamicTableSink is used for > > inserting/updating/deleting, I think it's reasonable for DynamicTableSink > > to implement SupportsTruncate. But I think it sounds reasonable to add a > > generic interface like DynamicTable to differentiate DynamicTableSource & > > DynamicTableSink. But it will definitely requires much design and > > discussion which deserves a dedicated FLIP. I perfer not to do that in this > > FLIP to avoid overdesign and I think it's not a must for this FLIP. Maybe > > we can discuss it if some day if we do need the new generic table interface. > > > > 2: Considering various catalogs and tables, it's hard for Flink to do the > > unified follow-up actions after truncating table. But still the external > > connector can do such follow-up actions in method `executeTruncation`. > > Btw, in Spark, for the newly truncate table interface[1], Spark only > > recaches the table after truncating table[2] which I think if Flink > > supports table cache in framework-level, > > we can also recache in framework-level for truncate table statement. > > > > [1] > > https://github.com/apache/spark/blob/1a42aa5bd44e7524bb55463bbd85bea782715834/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TruncatableTable.java > > [2] > > https://github.com/apache/spark/blob/06c09a79b371c5ac3e4ebad1118ed94b460f48d1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/TruncateTableExec.scala > > > > > > I think the external catalog can implemnet such logic in method > > `executeTruncation`. > > > > Best regards, > > Yuxia > > > > ----- 原始邮件 ----- > > 发件人: "liu ron" <ron9....@gmail.com> > > 收件人: "dev" <dev@flink.apache.org> > > 发送时间: 星期二, 2023年 4 月 11日 上午 10:51:36 > > 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement > > > > Hi, xia > > It's a nice improvement to support TRUNCATE TABLE statement, making Flink > > more feature-rich. > > I think the truncate syntax is a command that will be executed in the > > client's process, rather than pulling up a Flink job to execute on the > > cluster. So on the user-facing exposed interface, I think we should not let > > users implement the SupportsTruncate interface on the DynamicTableSink > > interface. This seems a bit strange and also confuses users, as hang said, > > why Source table does not support truncate. It would be nice if we could > > come up with a generic interface that supports truncate instead of binding > > it to the DynamicTableSink interface, and maybe in the future we will > > support more commands like truncate command. > > > > In addition, after truncating data, we may also need to update the metadata > > of the table, such as Hive table, we need to update the statistics, as well > > as clear the cache in the metastore, I think we should also consider these > > capabilities, Sparky has considered these, refer to > > > > https://github.com/apache/spark/blob/69dd20b5e45c7e3533efbfdc1974f59931c1b781/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L573 > > . > > > > Best, > > > > Ron > > > > Jim Hughes <jhug...@confluent.io.invalid> 于2023年4月11日周二 02:15写道: > > > > > Hi Yuxia, > > > > > > On Mon, Apr 10, 2023 at 10:35 AM yuxia <luoyu...@alumni.sjtu.edu.cn> > > > wrote: > > > > > > > Hi, Jim. > > > > > > > > 1: I'm expecting all DynamicTableSinks to support. But it's hard to > > > > support all at one shot. For the DynamicTableSinks that haven't > > > implemented > > > > SupportsTruncate interface, we'll throw exception > > > > like 'The truncate statement for the table is not supported as it > > hasn't > > > > implemented the interface SupportsTruncate'. Also, for some sinks that > > > > doesn't support deleting data, it can also implements it but throw more > > > > concrete exception like "xxx donesn't support to truncate a table as > > > delete > > > > is impossible for xxx". It depends on the external connector's > > > > implementation. > > > > Thanks for your advice, I updated it to the FLIP. > > > > > > > > > > Makes sense. > > > > > > > > > > 2: What do you mean by saying "truncate an input to a streaming query"? > > > > This FLIP is aimed to support TRUNCATE TABLE statement which is for > > > > truncating a table. In which case it will inoperates with streaming > > > queries? > > > > > > > > > > Let's take a source like Kafka as an example. Suppose I have an input > > > topic Foo, and query which uses it as an input. > > > > > > When Foo is truncated, if the truncation works as a delete and create, > > then > > > the connector may need to be made aware (otherwise it may try to use > > > offsets from the previous topic). On the other hand, one may have to ask > > > Kafka to delete records up to a certain point. > > > > > > Also, savepoints for the query may contain information from the truncated > > > table. Should this FLIP involve invalidating that information in some > > > manner? Or does truncating a source table for a query cause undefined > > > behavior on that query? > > > > > > Basically, I'm trying to think through the implementations of a truncate > > > operation to streaming sources and queries. > > > > > > Cheers, > > > > > > Jim > > > > > > > > > > Best regards, > > > > Yuxia > > > > > > > > ----- 原始邮件 ----- > > > > 发件人: "Jim Hughes" <jhug...@confluent.io.INVALID> > > > > 收件人: "dev" <dev@flink.apache.org> > > > > 发送时间: 星期一, 2023年 4 月 10日 下午 9:32:28 > > > > 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement > > > > > > > > Hi Yuxia, > > > > > > > > Two questions: > > > > > > > > 1. Are you expecting all DynamicTableSinks to support Truncate? The > > > FLIP > > > > could use some explanation for what supporting and not supporting the > > > > operation means. > > > > > > > > 2. How will truncate inoperate with streaming queries? That is, if I > > > > truncate an input to a streaming query, is there any defined behavior? > > > > > > > > Cheers, > > > > > > > > Jim > > > > > > > > On Wed, Mar 22, 2023 at 9:13 AM yuxia <luoyu...@alumni.sjtu.edu.cn> > > > wrote: > > > > > > > > > Hi, devs. > > > > > > > > > > I'd like to start a discussion about FLIP-302: Support TRUNCATE TABLE > > > > > statement [1]. > > > > > > > > > > The TRUNCATE TABLE statement is a SQL command that allows users to > > > > quickly > > > > > and efficiently delete all rows from a table without dropping the > > table > > > > > itself. This statement is commonly used in data warehouse, where > > large > > > > data > > > > > sets are frequently loaded and unloaded from tables. > > > > > So, this FLIP is meant to support TRUNCATE TABLE statement. M ore > > > > exactly, > > > > > this FLIP will bring Flink the TRUNCATE TABLE syntax and an interface > > > > with > > > > > which the coresponding connectors can implement their own logic for > > > > > truncating table. > > > > > > > > > > Looking forwards to your feedback. > > > > > > > > > > [1]: [ > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement > > > > > | > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement > > > > > ] > > > > > > > > > > > > > > > Best regards, > > > > > Yuxia > > > > > > > > > > > > > >