Hi guys:
    Are there any further comments on this proposal? If not, I would like
to start a voting thread.

Thanks,
Aitozi.

Aitozi <[email protected]> 于2024年7月24日周三 19:46写道:

> Hi Jingsong:
>
> 1. The scan.blockcache.enabled will decide whether to enable the cache
> 2. The static object (BlockCacheManager) maintains a singleton BlockCache
> 3. Currently not for manifest
>
> I just opened a poc PR for a closer look
> https://github.com/apache/paimon/pull/3807
>
> Thanks,
> Aitozi
>
> Jingsong Li <[email protected]> 于2024年7月24日周三 16:23写道:
>
>> Hi Aitozi,
>>
>> Can we clarify the following:
>> 1. What is the configuration for enabling cache?
>> 2. What object is responsible for maintaining Cache? Table class?
>> Static object? Unified management of computing engine objects?
>> 3. Can Cache be applied to the manifest?
>>
>> Best,
>> Jingsong
>>
>> On Wed, Jul 24, 2024 at 10:30 AM Aitozi <[email protected]> wrote:
>> >
>> > Hi Jingsong
>> >      I have updated the wiki with the API section. Please review it
>> again.
>> >
>> > Thanks,
>> > Aitozi
>> >
>> > Jingsong Li <[email protected]> 于2024年7月23日周二 18:20写道:
>> >
>> > > Thanks Aitozi for starting this discussion.
>> > >
>> > > +1 to have a block cache.
>> > >
>> > > I suggest you add where we need to modify and what the core API is.
>> > >
>> > > Best,
>> > > Jingsong
>> > >
>> > > On Tue, Jul 16, 2024 at 5:59 PM Aitozi <[email protected]> wrote:
>> > > >
>> > > > Hi, wj
>> > > >     Thanks for your comments.
>> > > > (1) In an OLAP system, the same query may be executed multiple
>> times, and
>> > > > different snapshots may share the same data file.
>> > > > Therefore, caching can help reduce the need to fetch data from
>> remote
>> > > > storage.
>> > > > (2) Both CachedSeekableInputStream and BlockCache will be used, the
>> > > > CachedSeekableInputStream will use BlockCache to find the target
>> block
>> > > > (3) A BlockQueue holds the list of available blocks that can be
>> used to
>> > > > store data.
>> > > >
>> > > > Thanks,
>> > > > Aitozi.
>> > > >
>> > > > wj wang <[email protected]> 于2024年7月16日周二 17:42写道:
>> > > >
>> > > > > Thanks Aitozi for initiating this discussion.
>> > > > > I have some questions:
>> > > > >
>> > > > > (1) Why need this cache in the analysis senior? When scan a
>> snapshot,
>> > > > > why a dataFile will be read multiple times?
>> > > > > (2) CachedSeekableInputStream and BlockCache, which
>> implementation do
>> > > > > you prefer to choose?
>> > > > > (3) In BlockCache, why introduce a BlockQueue?
>> > > > >
>> > > > > Best,
>> > > > > wangwj
>> > > > >
>> > > > > On Tue, Jul 16, 2024 at 3:07 PM Aitozi <[email protected]>
>> wrote:
>> > > > > >
>> > > > > > Hi, Fang Yong
>> > > > > >
>> > > > > >     Thanks for your valuable comments. Here are some of my
>> thoughts
>> > > on
>> > > > > your
>> > > > > > question
>> > > > > >
>> > > > > > (1) The distributed cache and local file cache actually work in
>> > > different
>> > > > > > locations, and their functions are orthogonal.
>> > > > > > Therefore, I believe that these two can be used together. So
>> this
>> > > > > proposal
>> > > > > > mainly focus on the local cache
>> > > > > > (2) In our design, the scheduler utilizes the consistent hash
>> > > strategy to
>> > > > > > assign DataSplits to computing nodes,
>> > > > > > enabling cache colocation scheduling.
>> > > > > >
>> > > > > > Repost the doc on wiki page:
>> > > > > >
>> > > > > >
>> > > > >
>> > >
>> https://cwiki.apache.org/confluence/display/PAIMON/PIP-24+Introduce+data+cache+in+paimon+reader
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Aitozi.
>> > > > > >
>> > > > > > Yong Fang <[email protected]> 于2024年7月16日周二 14:37写道:
>> > > > > >
>> > > > > > > Thanks Aitozi for initiating this discussion. For the data
>> cache, I
>> > > > > have
>> > > > > > > some questions:
>> > > > > > >
>> > > > > > > 1. In the design document, the focus is mainly on block
>> cache. In a
>> > > > > > > complete cache system, it is usually divided into distributed
>> > > cache,
>> > > > > local
>> > > > > > > file cache, block cache, and key-value cache. Compared with
>> block
>> > > > > cache,
>> > > > > > > would it be more effective to introduce a distributed cache
>> such as
>> > > > > > > Alluxio?
>> > > > > > >
>> > > > > > > 2. For the computing engine: What interfaces should Paimon's
>> cache
>> > > > > provide
>> > > > > > > so that the computing engine can be aware of which computing
>> nodes
>> > > > > cache
>> > > > > > > which data, and facilitate the deployment of computing tasks
>> to the
>> > > > > > > appropriate computing nodes at the scheduling layer?
>> > > > > > >
>> > > > > > > Best,
>> > > > > > > FangYong
>> > > > > > >
>> > > > > > > On Tue, Jul 16, 2024 at 10:45 AM Aitozi <[email protected]
>> >
>> > > wrote:
>> > > > > > >
>> > > > > > > > Hi devs:
>> > > > > > > >     I want to initiate a discussion on the ability to
>> support
>> > > data
>> > > > > cache
>> > > > > > > in
>> > > > > > > > the Paimon reader, aiming to accelerate the performance of
>> scan
>> > > > > operators
>> > > > > > > > in analytical scenarios. The detailed design document is as
>> > > follows
>> > > > > [1].
>> > > > > > > > Looking forward to your feedback.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > [1]:
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > >
>> > >
>> https://docs.google.com/document/d/1-zzDpxcubukMR-21n66OPv2ViKEFeEJ_Mivc-wW4gLM/edit?usp=sharing
>> > > > > > > >
>> > > > > > > > Thanks
>> > > > > > > > Aitozi.
>> > > > > > > >
>> > > > > > >
>> > > > >
>> > >
>>
>

Reply via email to