Hi, xia
   > which I think if Flink supports table cache in framework-level,
we can also recache in framework-level for truncate table statement.

I think currently flink catalog already will some stats for the table,
eg: after `ANALYZE TABLE`, the table's Statistics will be stored in
the
catalog, but truncate table will not correct the statistic.

I know it's hard for Flink to do the unified follow-up actions after
truncating table. But I think we need define a clear location for the
Flink Catalog
in mind.
IMO, Flink as a compute engine, it's hard for it to maintain the
catalog for different storage table itself. So with more and more
`Executable`
command introduced the data in catalog will be cleaved.
In this case, after truncate the catalog's following part may be affected:

- the table/column statistic will be not correct
- the partition of this table should be cleared


Best,
Aitozi.


liu ron <ron9....@gmail.com> 于2023年4月13日周四 11:28写道:

>
> Hi, xia
>
> Thanks for your explanation, for the first question, given the current
> status, I think we can provide the generic interface in the future if we
> need it. For the second question,  it makes sense to me if we can
> support the table cache at the framework level.
>
> Best,
> Ron
>
> yuxia <luoyu...@alumni.sjtu.edu.cn> 于2023年4月11日周二 16:12写道:
>
> > Hi, ron.
> >
> > 1: Considering for deleting rows, Flink will also write delete record to
> > achive purpose of deleting data, it may not as so strange for connector
> > devs to make DynamicTableSink implement SupportsTruncate to support
> > truncate the table. Based on the assume that DynamicTableSink is used for
> > inserting/updating/deleting, I think it's reasonable for DynamicTableSink
> > to implement SupportsTruncate. But I think it sounds reasonable to add a
> > generic interface like DynamicTable to differentiate DynamicTableSource &
> > DynamicTableSink. But it will definitely requires much design and
> > discussion which deserves a dedicated FLIP. I perfer not to do that in this
> > FLIP to avoid overdesign and I think it's not a must for this FLIP. Maybe
> > we can discuss it if some day if we do need the new generic table interface.
> >
> > 2: Considering various catalogs and tables, it's hard for Flink to do the
> > unified follow-up actions after truncating table. But still the external
> > connector can do such follow-up actions in method `executeTruncation`.
> > Btw, in Spark, for the newly truncate table interface[1], Spark only
> > recaches the table after truncating table[2] which I think if Flink
> > supports table cache in framework-level,
> > we can also recache in framework-level for truncate table statement.
> >
> > [1]
> > https://github.com/apache/spark/blob/1a42aa5bd44e7524bb55463bbd85bea782715834/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TruncatableTable.java
> > [2]
> > https://github.com/apache/spark/blob/06c09a79b371c5ac3e4ebad1118ed94b460f48d1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/TruncateTableExec.scala
> >
> >
> > I think the external catalog can implemnet such logic in method
> > `executeTruncation`.
> >
> > Best regards,
> > Yuxia
> >
> > ----- 原始邮件 -----
> > 发件人: "liu ron" <ron9....@gmail.com>
> > 收件人: "dev" <dev@flink.apache.org>
> > 发送时间: 星期二, 2023年 4 月 11日 上午 10:51:36
> > 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
> >
> > Hi, xia
> > It's a nice improvement to support TRUNCATE TABLE statement, making Flink
> > more feature-rich.
> > I think the truncate syntax is a command that will be executed in the
> > client's process, rather than pulling up a Flink job to execute on the
> > cluster. So on the user-facing exposed interface, I think we should not let
> > users implement the SupportsTruncate interface on the DynamicTableSink
> > interface. This seems a bit strange and also confuses users, as hang said,
> > why Source table does not support truncate. It would be nice if we could
> > come up with a generic interface that supports truncate instead of binding
> > it to the DynamicTableSink interface, and maybe in the future we will
> > support more commands like truncate command.
> >
> > In addition, after truncating data, we may also need to update the metadata
> > of the table, such as Hive table, we need to update the statistics, as well
> > as clear the cache in the metastore, I think we should also consider these
> > capabilities, Sparky has considered these, refer to
> >
> > https://github.com/apache/spark/blob/69dd20b5e45c7e3533efbfdc1974f59931c1b781/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L573
> > .
> >
> > Best,
> >
> > Ron
> >
> > Jim Hughes <jhug...@confluent.io.invalid> 于2023年4月11日周二 02:15写道:
> >
> > > Hi Yuxia,
> > >
> > > On Mon, Apr 10, 2023 at 10:35 AM yuxia <luoyu...@alumni.sjtu.edu.cn>
> > > wrote:
> > >
> > > > Hi, Jim.
> > > >
> > > > 1: I'm expecting all DynamicTableSinks to support. But it's hard to
> > > > support all at one shot. For the DynamicTableSinks that haven't
> > > implemented
> > > > SupportsTruncate interface, we'll throw exception
> > > > like 'The truncate statement for the table is not supported as it
> > hasn't
> > > > implemented the interface SupportsTruncate'. Also, for some sinks that
> > > > doesn't support deleting data, it can also implements it but throw more
> > > > concrete exception like "xxx donesn't support to truncate a table as
> > > delete
> > > > is impossible for xxx". It depends on the external connector's
> > > > implementation.
> > > > Thanks for your advice, I updated it to the FLIP.
> > > >
> > >
> > > Makes sense.
> > >
> > >
> > > > 2: What do you mean by saying "truncate an input to a streaming query"?
> > > > This FLIP is aimed to support TRUNCATE TABLE statement which is for
> > > > truncating a table. In which case it will inoperates with streaming
> > > queries?
> > > >
> > >
> > > Let's take a source like Kafka as an example.  Suppose I have an input
> > > topic Foo, and query which uses it as an input.
> > >
> > > When Foo is truncated, if the truncation works as a delete and create,
> > then
> > > the connector may need to be made aware (otherwise it may try to use
> > > offsets from the previous topic).  On the other hand, one may have to ask
> > > Kafka to delete records up to a certain point.
> > >
> > > Also, savepoints for the query may contain information from the truncated
> > > table.  Should this FLIP involve invalidating that information in some
> > > manner?  Or does truncating a source table for a query cause undefined
> > > behavior on that query?
> > >
> > > Basically, I'm trying to think through the implementations of a truncate
> > > operation to streaming sources and queries.
> > >
> > > Cheers,
> > >
> > > Jim
> > >
> > >
> > > > Best regards,
> > > > Yuxia
> > > >
> > > > ----- 原始邮件 -----
> > > > 发件人: "Jim Hughes" <jhug...@confluent.io.INVALID>
> > > > 收件人: "dev" <dev@flink.apache.org>
> > > > 发送时间: 星期一, 2023年 4 月 10日 下午 9:32:28
> > > > 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
> > > >
> > > > Hi Yuxia,
> > > >
> > > > Two questions:
> > > >
> > > > 1.  Are you expecting all DynamicTableSinks to support Truncate?  The
> > > FLIP
> > > > could use some explanation for what supporting and not supporting the
> > > > operation means.
> > > >
> > > > 2.  How will truncate inoperate with streaming queries?  That is, if I
> > > > truncate an input to a streaming query, is there any defined behavior?
> > > >
> > > > Cheers,
> > > >
> > > > Jim
> > > >
> > > > On Wed, Mar 22, 2023 at 9:13 AM yuxia <luoyu...@alumni.sjtu.edu.cn>
> > > wrote:
> > > >
> > > > > Hi, devs.
> > > > >
> > > > > I'd like to start a discussion about FLIP-302: Support TRUNCATE TABLE
> > > > > statement [1].
> > > > >
> > > > > The TRUNCATE TABLE statement is a SQL command that allows users to
> > > > quickly
> > > > > and efficiently delete all rows from a table without dropping the
> > table
> > > > > itself. This statement is commonly used in data warehouse, where
> > large
> > > > data
> > > > > sets are frequently loaded and unloaded from tables.
> > > > > So, this FLIP is meant to support TRUNCATE TABLE statement. M ore
> > > > exactly,
> > > > > this FLIP will bring Flink the TRUNCATE TABLE syntax and an interface
> > > > with
> > > > > which the coresponding connectors can implement their own logic for
> > > > > truncating table.
> > > > >
> > > > > Looking forwards to your feedback.
> > > > >
> > > > > [1]: [
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
> > > > > |
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
> > > > > ]
> > > > >
> > > > >
> > > > > Best regards,
> > > > > Yuxia
> > > > >
> > > >
> > >
> >

Reply via email to