Re: [DISCUSS] FLIP-205: Support cache in DataStream for Batch Processing

Gen Luo Thu, 30 Dec 2021 02:59:44 -0800

Hi Xuannan,

I found FLIP-188[1] that is aiming to introduce a built-in dynamic table
storage, which provides a unified changelog & table representation. Tables
stored there can be used in further ad-hoc queries. To my understanding,
it's quite like an implementation of caching in Table API, and the ad-hoc
queries are somehow like further steps in an interactive program.


As you replied, caching at Table/SQL API is the next step, as a part of
interactive programming in Table API, which we all agree is the major
scenario. What do you think about the relation between it and FLIP-188?

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-188%3A+Introduce+Built-in+Dynamic+Table+Storage


On Wed, Dec 29, 2021 at 7:53 PM Xuannan Su <[email protected]> wrote:

> Hi David,
>
> Thanks for sharing your thoughts.
>
> You are right that most people tend to use high-level API for
> interactive data exploration. Actually, there is
> the FLIP-36 [1] covering the cache API at Table/SQL API. As far as I
> know, it has been accepted but hasn’t been implemented. At the time
> when it is drafted, DataStream did not support Batch mode but Table
> API does.
>
> Now that the DataStream API does support batch processing, I think we
> can focus on supporting cache at DataStream first. It is still
> valuable for DataStream users and most of the work we do in this FLIP
> can be reused. So I want to limit the scope of this FLIP.
>
> After caching is supported at DataStream, we can continue from where
> FLIP-36 left off to support caching at Table/SQL API. We might have to
> re-vote FLIP-36 or draft a new FLIP. What do you think?
>
> Best,
> Xuannan
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink
>
>
>
> On Wed, Dec 29, 2021 at 6:08 PM David Morávek <[email protected]> wrote:
> >
> > Hi Xuannan,
> >
> > thanks for drafting this FLIP.
> >
> > One immediate thought, from what I've seen for interactive data
> exploration
> > with Spark, most people tend to use the higher level APIs, that allow for
> > faster prototyping (Table API in Flink's case). Should the Table API also
> > be covered by this FLIP?
> >
> > Best,
> > D.
> >
> > On Wed, Dec 29, 2021 at 10:36 AM Xuannan Su <[email protected]>
> wrote:
> >
> > > Hi devs,
> > >
> > > I’d like to start a discussion about adding support to cache the
> > > intermediate result at DataStream API for batch processing.
> > >
> > > As the DataStream API now supports batch execution mode, we see users
> > > using the DataStream API to run batch jobs. Interactive programming is
> > > an important use case of Flink batch processing. And the ability to
> > > cache intermediate results of a DataStream is crucial to the
> > > interactive programming experience.
> > >
> > > Therefore, we propose to support caching a DataStream in Batch
> > > execution. We believe that users can benefit a lot from the change and
> > > encourage them to use DataStream API for their interactive batch
> > > processing work.
> > >
> > > Please check out the FLIP-205 [1] and feel free to reply to this email
> > > thread. Looking forward to your feedback!
> > >
> > > [1]
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-205%3A+Support+Cache+in+DataStream+for+Batch+Processing
> > >
> > > Best,
> > > Xuannan
> > >
>

Re: [DISCUSS] FLIP-205: Support cache in DataStream for Batch Processing

Reply via email to