Hi Aitozi,

Can we clarify the following:
1. What is the configuration for enabling cache?
2. What object is responsible for maintaining Cache? Table class?
Static object? Unified management of computing engine objects?
3. Can Cache be applied to the manifest?

Best,
Jingsong

On Wed, Jul 24, 2024 at 10:30 AM Aitozi <gjying1...@gmail.com> wrote:
>
> Hi Jingsong
>      I have updated the wiki with the API section. Please review it again.
>
> Thanks,
> Aitozi
>
> Jingsong Li <jingsongl...@gmail.com> 于2024年7月23日周二 18:20写道:
>
> > Thanks Aitozi for starting this discussion.
> >
> > +1 to have a block cache.
> >
> > I suggest you add where we need to modify and what the core API is.
> >
> > Best,
> > Jingsong
> >
> > On Tue, Jul 16, 2024 at 5:59 PM Aitozi <gjying1...@gmail.com> wrote:
> > >
> > > Hi, wj
> > >     Thanks for your comments.
> > > (1) In an OLAP system, the same query may be executed multiple times, and
> > > different snapshots may share the same data file.
> > > Therefore, caching can help reduce the need to fetch data from remote
> > > storage.
> > > (2) Both CachedSeekableInputStream and BlockCache will be used, the
> > > CachedSeekableInputStream will use BlockCache to find the target block
> > > (3) A BlockQueue holds the list of available blocks that can be used to
> > > store data.
> > >
> > > Thanks,
> > > Aitozi.
> > >
> > > wj wang <hongli....@gmail.com> 于2024年7月16日周二 17:42写道:
> > >
> > > > Thanks Aitozi for initiating this discussion.
> > > > I have some questions:
> > > >
> > > > (1) Why need this cache in the analysis senior? When scan a snapshot,
> > > > why a dataFile will be read multiple times?
> > > > (2) CachedSeekableInputStream and BlockCache, which implementation do
> > > > you prefer to choose?
> > > > (3) In BlockCache, why introduce a BlockQueue?
> > > >
> > > > Best,
> > > > wangwj
> > > >
> > > > On Tue, Jul 16, 2024 at 3:07 PM Aitozi <gjying1...@gmail.com> wrote:
> > > > >
> > > > > Hi, Fang Yong
> > > > >
> > > > >     Thanks for your valuable comments. Here are some of my thoughts
> > on
> > > > your
> > > > > question
> > > > >
> > > > > (1) The distributed cache and local file cache actually work in
> > different
> > > > > locations, and their functions are orthogonal.
> > > > > Therefore, I believe that these two can be used together. So this
> > > > proposal
> > > > > mainly focus on the local cache
> > > > > (2) In our design, the scheduler utilizes the consistent hash
> > strategy to
> > > > > assign DataSplits to computing nodes,
> > > > > enabling cache colocation scheduling.
> > > > >
> > > > > Repost the doc on wiki page:
> > > > >
> > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/PAIMON/PIP-24+Introduce+data+cache+in+paimon+reader
> > > > >
> > > > > Thanks,
> > > > > Aitozi.
> > > > >
> > > > > Yong Fang <zjur...@gmail.com> 于2024年7月16日周二 14:37写道:
> > > > >
> > > > > > Thanks Aitozi for initiating this discussion. For the data cache, I
> > > > have
> > > > > > some questions:
> > > > > >
> > > > > > 1. In the design document, the focus is mainly on block cache. In a
> > > > > > complete cache system, it is usually divided into distributed
> > cache,
> > > > local
> > > > > > file cache, block cache, and key-value cache. Compared with block
> > > > cache,
> > > > > > would it be more effective to introduce a distributed cache such as
> > > > > > Alluxio?
> > > > > >
> > > > > > 2. For the computing engine: What interfaces should Paimon's cache
> > > > provide
> > > > > > so that the computing engine can be aware of which computing nodes
> > > > cache
> > > > > > which data, and facilitate the deployment of computing tasks to the
> > > > > > appropriate computing nodes at the scheduling layer?
> > > > > >
> > > > > > Best,
> > > > > > FangYong
> > > > > >
> > > > > > On Tue, Jul 16, 2024 at 10:45 AM Aitozi <gjying1...@gmail.com>
> > wrote:
> > > > > >
> > > > > > > Hi devs:
> > > > > > >     I want to initiate a discussion on the ability to support
> > data
> > > > cache
> > > > > > in
> > > > > > > the Paimon reader, aiming to accelerate the performance of scan
> > > > operators
> > > > > > > in analytical scenarios. The detailed design document is as
> > follows
> > > > [1].
> > > > > > > Looking forward to your feedback.
> > > > > > >
> > > > > > >
> > > > > > > [1]:
> > > > > > >
> > > > > > >
> > > > > >
> > > >
> > https://docs.google.com/document/d/1-zzDpxcubukMR-21n66OPv2ViKEFeEJ_Mivc-wW4gLM/edit?usp=sharing
> > > > > > >
> > > > > > > Thanks
> > > > > > > Aitozi.
> > > > > > >
> > > > > >
> > > >
> >

Reply via email to