Re: [Discussion] Support SegmentLevel MinMax for better Pruning and less driver memory usage

Jacky Li Tue, 14 Jan 2020 23:19:06 -0800

+1 
This can reduce the memory footprint in spark driver, it is great for ultra big 
data


Regards,
Jacky

> 2020年1月14日 下午4:38，Indhumathi <indhumathi...@gmail.com> 写道：
> 
> Hello all,
> 
> In Cloud scenarios, index is too big to store in SparkDriver, since VM may
> not have so much memory.
> Currently in Carbon, we will load all indexes to cache for first time query.
> Since Carbon LRU Cache does 
> not support time-based expiration, indexes will be removed from cache based
> on LeastRecentlyUsed mechanism,
> when the carbon lru cache is full.
> 
> In some scenarios, where user's table has more segments and if user queries
> only very few segments often, we no
> need to load all indexes to cache. For filter queries, if we prune and load
> only matched segments to cache, 
> then driver's memory will be saved.
> 
> For this purpose, I am planing to add block minmax to segment metadata file
> and prune segment based on segment files and
> load index only for matched segment. As part of this, will add a
> configurable carbon property '*carbon.load.all.index.to.cache*' 
> to allow user to load all indexes to cache if needed. BY default, value will
> be true.
> 
> Currently, for each load, we will write a segment metadata file, while holds
> the information about indexFile. 
> During query, we will read each segmentFile for getting indexFileInfo and
> then we will load all datamaps for the segment.
> MinMax data will be encoded and stored in segment file.
> 
> Any suggestions/inputs from the community is appreciated.
> 
> Thanks
> Indhumathi
> 
> 
> --
> Sent from: 
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>

Re: [Discussion] Support SegmentLevel MinMax for better Pruning and less driver memory usage

Reply via email to