Hi, I see the current implementation is using a static field through Format?
Maybe it is better to put the cache in FileIO? Through catalog options? Best, Jingsong On Wed, Jul 31, 2024 at 2:49 PM Aitozi <gjying1...@gmail.com> wrote: > > Hi guys: > Are there any further comments on this proposal? If not, I would like > to start a voting thread. > > Thanks, > Aitozi. > > Aitozi <gjying1...@gmail.com> 于2024年7月24日周三 19:46写道: > > > Hi Jingsong: > > > > 1. The scan.blockcache.enabled will decide whether to enable the cache > > 2. The static object (BlockCacheManager) maintains a singleton BlockCache > > 3. Currently not for manifest > > > > I just opened a poc PR for a closer look > > https://github.com/apache/paimon/pull/3807 > > > > Thanks, > > Aitozi > > > > Jingsong Li <jingsongl...@gmail.com> 于2024年7月24日周三 16:23写道: > > > >> Hi Aitozi, > >> > >> Can we clarify the following: > >> 1. What is the configuration for enabling cache? > >> 2. What object is responsible for maintaining Cache? Table class? > >> Static object? Unified management of computing engine objects? > >> 3. Can Cache be applied to the manifest? > >> > >> Best, > >> Jingsong > >> > >> On Wed, Jul 24, 2024 at 10:30 AM Aitozi <gjying1...@gmail.com> wrote: > >> > > >> > Hi Jingsong > >> > I have updated the wiki with the API section. Please review it > >> again. > >> > > >> > Thanks, > >> > Aitozi > >> > > >> > Jingsong Li <jingsongl...@gmail.com> 于2024年7月23日周二 18:20写道: > >> > > >> > > Thanks Aitozi for starting this discussion. > >> > > > >> > > +1 to have a block cache. > >> > > > >> > > I suggest you add where we need to modify and what the core API is. > >> > > > >> > > Best, > >> > > Jingsong > >> > > > >> > > On Tue, Jul 16, 2024 at 5:59 PM Aitozi <gjying1...@gmail.com> wrote: > >> > > > > >> > > > Hi, wj > >> > > > Thanks for your comments. > >> > > > (1) In an OLAP system, the same query may be executed multiple > >> times, and > >> > > > different snapshots may share the same data file. > >> > > > Therefore, caching can help reduce the need to fetch data from > >> remote > >> > > > storage. > >> > > > (2) Both CachedSeekableInputStream and BlockCache will be used, the > >> > > > CachedSeekableInputStream will use BlockCache to find the target > >> block > >> > > > (3) A BlockQueue holds the list of available blocks that can be > >> used to > >> > > > store data. > >> > > > > >> > > > Thanks, > >> > > > Aitozi. > >> > > > > >> > > > wj wang <hongli....@gmail.com> 于2024年7月16日周二 17:42写道: > >> > > > > >> > > > > Thanks Aitozi for initiating this discussion. > >> > > > > I have some questions: > >> > > > > > >> > > > > (1) Why need this cache in the analysis senior? When scan a > >> snapshot, > >> > > > > why a dataFile will be read multiple times? > >> > > > > (2) CachedSeekableInputStream and BlockCache, which > >> implementation do > >> > > > > you prefer to choose? > >> > > > > (3) In BlockCache, why introduce a BlockQueue? > >> > > > > > >> > > > > Best, > >> > > > > wangwj > >> > > > > > >> > > > > On Tue, Jul 16, 2024 at 3:07 PM Aitozi <gjying1...@gmail.com> > >> wrote: > >> > > > > > > >> > > > > > Hi, Fang Yong > >> > > > > > > >> > > > > > Thanks for your valuable comments. Here are some of my > >> thoughts > >> > > on > >> > > > > your > >> > > > > > question > >> > > > > > > >> > > > > > (1) The distributed cache and local file cache actually work in > >> > > different > >> > > > > > locations, and their functions are orthogonal. > >> > > > > > Therefore, I believe that these two can be used together. So > >> this > >> > > > > proposal > >> > > > > > mainly focus on the local cache > >> > > > > > (2) In our design, the scheduler utilizes the consistent hash > >> > > strategy to > >> > > > > > assign DataSplits to computing nodes, > >> > > > > > enabling cache colocation scheduling. > >> > > > > > > >> > > > > > Repost the doc on wiki page: > >> > > > > > > >> > > > > > > >> > > > > > >> > > > >> https://cwiki.apache.org/confluence/display/PAIMON/PIP-24+Introduce+data+cache+in+paimon+reader > >> > > > > > > >> > > > > > Thanks, > >> > > > > > Aitozi. > >> > > > > > > >> > > > > > Yong Fang <zjur...@gmail.com> 于2024年7月16日周二 14:37写道: > >> > > > > > > >> > > > > > > Thanks Aitozi for initiating this discussion. For the data > >> cache, I > >> > > > > have > >> > > > > > > some questions: > >> > > > > > > > >> > > > > > > 1. In the design document, the focus is mainly on block > >> cache. In a > >> > > > > > > complete cache system, it is usually divided into distributed > >> > > cache, > >> > > > > local > >> > > > > > > file cache, block cache, and key-value cache. Compared with > >> block > >> > > > > cache, > >> > > > > > > would it be more effective to introduce a distributed cache > >> such as > >> > > > > > > Alluxio? > >> > > > > > > > >> > > > > > > 2. For the computing engine: What interfaces should Paimon's > >> cache > >> > > > > provide > >> > > > > > > so that the computing engine can be aware of which computing > >> nodes > >> > > > > cache > >> > > > > > > which data, and facilitate the deployment of computing tasks > >> to the > >> > > > > > > appropriate computing nodes at the scheduling layer? > >> > > > > > > > >> > > > > > > Best, > >> > > > > > > FangYong > >> > > > > > > > >> > > > > > > On Tue, Jul 16, 2024 at 10:45 AM Aitozi <gjying1...@gmail.com > >> > > >> > > wrote: > >> > > > > > > > >> > > > > > > > Hi devs: > >> > > > > > > > I want to initiate a discussion on the ability to > >> support > >> > > data > >> > > > > cache > >> > > > > > > in > >> > > > > > > > the Paimon reader, aiming to accelerate the performance of > >> scan > >> > > > > operators > >> > > > > > > > in analytical scenarios. The detailed design document is as > >> > > follows > >> > > > > [1]. > >> > > > > > > > Looking forward to your feedback. > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > [1]: > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > >> > > > >> https://docs.google.com/document/d/1-zzDpxcubukMR-21n66OPv2ViKEFeEJ_Mivc-wW4gLM/edit?usp=sharing > >> > > > > > > > > >> > > > > > > > Thanks > >> > > > > > > > Aitozi. > >> > > > > > > > > >> > > > > > > > >> > > > > > >> > > > >> > >