Re: [DISCUSS] FLIP-205: Support cache in DataStream for Batch Processing

Xuannan Su Wed, 29 Dec 2021 03:53:49 -0800

Hi David,

Thanks for sharing your thoughts.

You are right that most people tend to use high-level API for
interactive data exploration. Actually, there is
the FLIP-36 [1] covering the cache API at Table/SQL API. As far as I
know, it has been accepted but hasn’t been implemented. At the time
when it is drafted, DataStream did not support Batch mode but Table
API does.

Now that the DataStream API does support batch processing, I think we
can focus on supporting cache at DataStream first. It is still
valuable for DataStream users and most of the work we do in this FLIP
can be reused. So I want to limit the scope of this FLIP.

After caching is supported at DataStream, we can continue from where
FLIP-36 left off to support caching at Table/SQL API. We might have to
re-vote FLIP-36 or draft a new FLIP. What do you think?

Best,
Xuannan

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink

On Wed, Dec 29, 2021 at 6:08 PM David Morávek <d...@apache.org> wrote:
>
> Hi Xuannan,
>
> thanks for drafting this FLIP.
>
> One immediate thought, from what I've seen for interactive data exploration
> with Spark, most people tend to use the higher level APIs, that allow for
> faster prototyping (Table API in Flink's case). Should the Table API also
> be covered by this FLIP?
>
> Best,
> D.
>
> On Wed, Dec 29, 2021 at 10:36 AM Xuannan Su <suxuanna...@gmail.com> wrote:
>
> > Hi devs,
> >
> > I’d like to start a discussion about adding support to cache the
> > intermediate result at DataStream API for batch processing.
> >
> > As the DataStream API now supports batch execution mode, we see users
> > using the DataStream API to run batch jobs. Interactive programming is
> > an important use case of Flink batch processing. And the ability to
> > cache intermediate results of a DataStream is crucial to the
> > interactive programming experience.
> >
> > Therefore, we propose to support caching a DataStream in Batch
> > execution. We believe that users can benefit a lot from the change and
> > encourage them to use DataStream API for their interactive batch
> > processing work.
> >
> > Please check out the FLIP-205 [1] and feel free to reply to this email
> > thread. Looking forward to your feedback!
> >
> > [1]
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-205%3A+Support+Cache+in+DataStream+for+Batch+Processing
> >
> > Best,
> > Xuannan
> >

Re: [DISCUSS] FLIP-205: Support cache in DataStream for Batch Processing

Reply via email to