+1 This can reduce the memory footprint in spark driver, it is great for ultra big data
Regards, Jacky > 2020年1月14日 下午4:38,Indhumathi <indhumathi...@gmail.com> 写道: > > Hello all, > > In Cloud scenarios, index is too big to store in SparkDriver, since VM may > not have so much memory. > Currently in Carbon, we will load all indexes to cache for first time query. > Since Carbon LRU Cache does > not support time-based expiration, indexes will be removed from cache based > on LeastRecentlyUsed mechanism, > when the carbon lru cache is full. > > In some scenarios, where user's table has more segments and if user queries > only very few segments often, we no > need to load all indexes to cache. For filter queries, if we prune and load > only matched segments to cache, > then driver's memory will be saved. > > For this purpose, I am planing to add block minmax to segment metadata file > and prune segment based on segment files and > load index only for matched segment. As part of this, will add a > configurable carbon property '*carbon.load.all.index.to.cache*' > to allow user to load all indexes to cache if needed. BY default, value will > be true. > > Currently, for each load, we will write a segment metadata file, while holds > the information about indexFile. > During query, we will read each segmentFile for getting indexFileInfo and > then we will load all datamaps for the segment. > MinMax data will be encoded and stored in segment file. > > Any suggestions/inputs from the community is appreciated. > > Thanks > Indhumathi > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ >