Re: [DISCUSS] Introduce data cache in Paimon reader

wj wang Tue, 16 Jul 2024 02:32:34 -0700

Thanks Aitozi for initiating this discussion.
I have some questions:

(1) Why need this cache in the analysis senior? When scan a snapshot,
why a dataFile will be read multiple times?
(2) CachedSeekableInputStream and BlockCache, which implementation do
you prefer to choose?
(3) In BlockCache, why introduce a BlockQueue?


Best,
wangwj

On Tue, Jul 16, 2024 at 3:07 PM Aitozi <[email protected]> wrote:
>
> Hi, Fang Yong
>
>     Thanks for your valuable comments. Here are some of my thoughts on your
> question
>
> (1) The distributed cache and local file cache actually work in different
> locations, and their functions are orthogonal.
> Therefore, I believe that these two can be used together. So this proposal
> mainly focus on the local cache
> (2) In our design, the scheduler utilizes the consistent hash strategy to
> assign DataSplits to computing nodes,
> enabling cache colocation scheduling.
>
> Repost the doc on wiki page:
>
> https://cwiki.apache.org/confluence/display/PAIMON/PIP-24+Introduce+data+cache+in+paimon+reader
>
> Thanks,
> Aitozi.
>
> Yong Fang <[email protected]> 于2024年7月16日周二 14:37写道：
>
> > Thanks Aitozi for initiating this discussion. For the data cache, I have
> > some questions:
> >
> > 1. In the design document, the focus is mainly on block cache. In a
> > complete cache system, it is usually divided into distributed cache, local
> > file cache, block cache, and key-value cache. Compared with block cache,
> > would it be more effective to introduce a distributed cache such as
> > Alluxio?
> >
> > 2. For the computing engine: What interfaces should Paimon's cache provide
> > so that the computing engine can be aware of which computing nodes cache
> > which data, and facilitate the deployment of computing tasks to the
> > appropriate computing nodes at the scheduling layer?
> >
> > Best,
> > FangYong
> >
> > On Tue, Jul 16, 2024 at 10:45 AM Aitozi <[email protected]> wrote:
> >
> > > Hi devs:
> > >     I want to initiate a discussion on the ability to support data cache
> > in
> > > the Paimon reader, aiming to accelerate the performance of scan operators
> > > in analytical scenarios. The detailed design document is as follows [1].
> > > Looking forward to your feedback.
> > >
> > >
> > > [1]:
> > >
> > >
> > https://docs.google.com/document/d/1-zzDpxcubukMR-21n66OPv2ViKEFeEJ_Mivc-wW4gLM/edit?usp=sharing
> > >
> > > Thanks
> > > Aitozi.
> > >
> >

Re: [DISCUSS] Introduce data cache in Paimon reader

Reply via email to