Re: [DISCUSS] FLIP-435: Introduce a New Dynamic Table for Simplifying Data Pipelines

Jing Zhang Wed, 13 Mar 2024 00:19:04 -0700

Hi, Lincoln & Ron,

Thanks for the proposal.


I agree with the question raised by Timo.

Besides, I have some other questions.
1. How to define query of dynamic table?
Use flink sql or introducing new syntax?
If use flink sql, how to handle the difference in SQL between streaming and
batch processing?
For example, a query including window aggregate based on processing time?
or a query including global order by?

2. Whether modify the query of dynamic table is allowed?
Or we could only refresh a dynamic table based on initial query?

3. How to use dynamic table?
The dynamic table seems to be similar with materialized view.  Will we do
something like materialized view rewriting during the optimization?

Best,
Jing Zhang


Timo Walther <[email protected]> 于2024年3月13日周三 01:24写道：

> Hi Lincoln & Ron,
>
> thanks for proposing this FLIP. I think a design similar to what you
> propose has been in the heads of many people, however, I'm wondering how
> this will fit into the bigger picture.
>
> I haven't deeply reviewed the FLIP yet, but would like to ask some
> initial questions:
>
> Flink has introduced the concept of Dynamic Tables many years ago. How
> does the term "Dynamic Table" fit into Flink's regular tables and also
> how does it relate to Table API?
>
> I fear that adding the DYNAMIC TABLE keyword could cause confusion for
> users, because a term for regular CREATE TABLE (that can be "kind of
> dynamic" as well and is backed by a changelog) is then missing. Also
> given that we call our connectors for those tables, DynamicTableSource
> and DynamicTableSink.
>
> In general, I find it contradicting that a TABLE can be "paused" or
> "resumed". From an English language perspective, this does sound
> incorrect. In my opinion (without much research yet), a continuous
> updating trigger should rather be modelled as a CREATE MATERIALIZED VIEW
> (which users are familiar with?) or a new concept such as a CREATE TASK
> (that can be paused and resumed?).
>
> How do you envision re-adding the functionality of a statement set, that
> fans out to multiple tables? This is a very important use case for data
> pipelines.
>
> Since the early days of Flink SQL, we were discussing `SELECT STREAM *
> FROM T EMIT 5 MINUTES`. Your proposal seems to rephrase STREAM and EMIT,
> into other keywords DYNAMIC TABLE and FRESHNESS. But the core
> functionality is still there. I'm wondering if we should widen the scope
> (maybe not part of this FLIP but a new FLIP) to follow the standard more
> closely. Making `SELECT * FROM t` bounded by default and use new syntax
> for the dynamic behavior. Flink 2.0 would be the perfect time for this,
> however, it would require careful discussions. What do you think?
>
> Regards,
> Timo
>
>
> On 11.03.24 08:23, Ron liu wrote:
> > Hi, Dev
> >
> >
> > Lincoln Lee and I would like to start a discussion about FLIP-435:
> > Introduce a  New Dynamic Table for Simplifying Data Pipelines.
> >
> >
> > This FLIP is designed to simplify the development of data processing
> > pipelines. With Dynamic Tables with uniform SQL statements and
> > freshness, users can define batch and streaming transformations to
> > data in the same way, accelerate ETL pipeline development, and manage
> > task scheduling automatically.
> >
> >
> > For more details, see FLIP-435 [1]. Looking forward to your feedback.
> >
> >
> > [1]
> >
> >
> > Best,
> >
> > Lincoln & Ron
> >
>
>

Re: [DISCUSS] FLIP-435: Introduce a New Dynamic Table for Simplifying Data Pipelines

Reply via email to