Hi Xuannan, I found FLIP-188[1] that is aiming to introduce a built-in dynamic table storage, which provides a unified changelog & table representation. Tables stored there can be used in further ad-hoc queries. To my understanding, it's quite like an implementation of caching in Table API, and the ad-hoc queries are somehow like further steps in an interactive program.
As you replied, caching at Table/SQL API is the next step, as a part of interactive programming in Table API, which we all agree is the major scenario. What do you think about the relation between it and FLIP-188? [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-188%3A+Introduce+Built-in+Dynamic+Table+Storage On Wed, Dec 29, 2021 at 7:53 PM Xuannan Su <suxuanna...@gmail.com> wrote: > Hi David, > > Thanks for sharing your thoughts. > > You are right that most people tend to use high-level API for > interactive data exploration. Actually, there is > the FLIP-36 [1] covering the cache API at Table/SQL API. As far as I > know, it has been accepted but hasn’t been implemented. At the time > when it is drafted, DataStream did not support Batch mode but Table > API does. > > Now that the DataStream API does support batch processing, I think we > can focus on supporting cache at DataStream first. It is still > valuable for DataStream users and most of the work we do in this FLIP > can be reused. So I want to limit the scope of this FLIP. > > After caching is supported at DataStream, we can continue from where > FLIP-36 left off to support caching at Table/SQL API. We might have to > re-vote FLIP-36 or draft a new FLIP. What do you think? > > Best, > Xuannan > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink > > > > On Wed, Dec 29, 2021 at 6:08 PM David Morávek <d...@apache.org> wrote: > > > > Hi Xuannan, > > > > thanks for drafting this FLIP. > > > > One immediate thought, from what I've seen for interactive data > exploration > > with Spark, most people tend to use the higher level APIs, that allow for > > faster prototyping (Table API in Flink's case). Should the Table API also > > be covered by this FLIP? > > > > Best, > > D. > > > > On Wed, Dec 29, 2021 at 10:36 AM Xuannan Su <suxuanna...@gmail.com> > wrote: > > > > > Hi devs, > > > > > > I’d like to start a discussion about adding support to cache the > > > intermediate result at DataStream API for batch processing. > > > > > > As the DataStream API now supports batch execution mode, we see users > > > using the DataStream API to run batch jobs. Interactive programming is > > > an important use case of Flink batch processing. And the ability to > > > cache intermediate results of a DataStream is crucial to the > > > interactive programming experience. > > > > > > Therefore, we propose to support caching a DataStream in Batch > > > execution. We believe that users can benefit a lot from the change and > > > encourage them to use DataStream API for their interactive batch > > > processing work. > > > > > > Please check out the FLIP-205 [1] and feel free to reply to this email > > > thread. Looking forward to your feedback! > > > > > > [1] > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-205%3A+Support+Cache+in+DataStream+for+Batch+Processing > > > > > > Best, > > > Xuannan > > > >