Hi Akash, 1. cache will be full when loading is still running all the time. the reason I mention the invalidation is to avoid case, specially, when cache is full before all targeted index is loaded.
When server just starting, keeping pre-prime and swap out the earliest loaded index is not good. Maybe pre-prime need to check the capacity of available cache before load index, else stop pre-prime any more? 2. I think regex/wildcard is more flexible to use, such as : *.* for all dbs and tables test.* for all tables in test db test.day_table_201908* for table has targeted prefix 3. yes, you are right, fire a count(*) can do that. On 2019/08/19 09:23:06, Akash Nilugal <akashnilu...@gmail.com> wrote: > Hi manhua, > > Thanks for the inputs. > > 1. No need to take care separately to invalidate the cache, i agree that it > will have limit. Since we already have eviction policy, when next query > comes, whenever required, it will evict and load the segments required, so > better not to have a separate mechanism to invalidate cache during > pre-prime. > > 2. > i. For configuration support of pre-prime, already we can have the database > name or table name, about the regex support, we will note it, and based on > other use case and impacts, i will update the design document. > ii. During load no need to load the table or read any configuration for > pre-prime. During load pre-prime, just take the current new segment and > load into cache. > > 3. For command support, can you please explain with more use cases. Because > current index server startup will load, and when you say command, even if i > do count(*) also, that will load all the segments. So i think new command > won't be necessary. > > Please get back for any clarifications or doubts. > > Thanks > > Regards, > Akash R Nilugal > > On Fri, Aug 16, 2019, 4:26 PM Akash Nilugal <akashnilu...@gmail.com> wrote: > > > Hi All, > > > > I have raised a jira and attached the design doc there .please refer > > > > CARBONDATA - 3492 > > > > Regards, > > Akash > > > > On Thu, Aug 15, 2019, 5:33 PM Akash Nilugal <akashnilu...@gmail.com> > > wrote: > > > >> Hi Community, > >> > >> Currently, we have an index server which basically helps in distributed > >> caching of the datamaps in a separate spark application. > >> > >> The caching of the datamaps in index server will start once the query is > >> fired on the table for the first time, all the datamaps will be loaded > >> > >> if the count(*) is fired and only required will be loaded for any filter > >> query. > >> > >> > >> Here the problem or the bottleneck is, until and unless the query is > >> fired on table, the caching won’t be done for the table datamaps. > >> > >> So consider a scenario where we are just loading the data to table for > >> whole day and then next day we query, > >> > >> so all the segments will start loading into cache. So first time the > >> query will be slow. > >> > >> > >> What if we load the datamaps into cache or preprime the cache without > >> waititng for any query on the table? > >> > >> Yes, what if we load the cache after every load is done, what if we load > >> the cache for all the segments at once, > >> > >> so that first time query need not do all this job, which makes it faster. > >> > >> > >> Here i have attached the design document for the pre-priming of cache > >> into index server. Please have a look at it > >> > >> and any suggestions or inputs on this are most welcomed. > >> > >> > >> > >> https://drive.google.com/file/d/1YUpDUv7ZPUyZQQYwQYcQK2t2aBQH18PB/view?usp=sharing > >> > >> > >> > >> Regards, > >> > >> Akash R Nilugal > >> > > >