Hi Manhua, 1. You are right that size will be full at one point, and according to you if we stop pre-priming, then query will go and try to load cache and if it does not get the size, it will evict and do, so even pre-prime does the same thing LRU will handle that for us. I will still think on this and let you know and if feasible i will update the design.
May be pre-priming we can stop once size is full, i 'll update this once finalised. 2. Wild card support is also fine according to your input, initial stage load and pre-prime is first and then regex support we can provide once after this. Thank you for the suggestion On 2019/08/19 09:53:10, Manhua <kevinjmh...@gmail.com> wrote: > Hi Akash, > > 1. cache will be full when loading is still running all the time. the reason > I mention the invalidation is to avoid case, specially, when cache is full > before all targeted index is loaded. > > When server just starting, keeping pre-prime and swap out the earliest loaded > index is not good. > Maybe pre-prime need to check the capacity of available cache before load > index, else stop pre-prime any more? > > 2. I think regex/wildcard is more flexible to use, > such as : > *.* for all dbs and tables > test.* for all tables in test db > test.day_table_201908* for table has targeted prefix > > 3. yes, you are right, fire a count(*) can do that. > > > On 2019/08/19 09:23:06, Akash Nilugal <akashnilu...@gmail.com> wrote: > > Hi manhua, > > > > Thanks for the inputs. > > > > 1. No need to take care separately to invalidate the cache, i agree that it > > will have limit. Since we already have eviction policy, when next query > > comes, whenever required, it will evict and load the segments required, so > > better not to have a separate mechanism to invalidate cache during > > pre-prime. > > > > 2. > > i. For configuration support of pre-prime, already we can have the database > > name or table name, about the regex support, we will note it, and based on > > other use case and impacts, i will update the design document. > > ii. During load no need to load the table or read any configuration for > > pre-prime. During load pre-prime, just take the current new segment and > > load into cache. > > > > 3. For command support, can you please explain with more use cases. Because > > current index server startup will load, and when you say command, even if i > > do count(*) also, that will load all the segments. So i think new command > > won't be necessary. > > > > Please get back for any clarifications or doubts. > > > > Thanks > > > > Regards, > > Akash R Nilugal > > > > On Fri, Aug 16, 2019, 4:26 PM Akash Nilugal <akashnilu...@gmail.com> wrote: > > > > > Hi All, > > > > > > I have raised a jira and attached the design doc there .please refer > > > > > > CARBONDATA - 3492 > > > > > > Regards, > > > Akash > > > > > > On Thu, Aug 15, 2019, 5:33 PM Akash Nilugal <akashnilu...@gmail.com> > > > wrote: > > > > > >> Hi Community, > > >> > > >> Currently, we have an index server which basically helps in distributed > > >> caching of the datamaps in a separate spark application. > > >> > > >> The caching of the datamaps in index server will start once the query is > > >> fired on the table for the first time, all the datamaps will be loaded > > >> > > >> if the count(*) is fired and only required will be loaded for any filter > > >> query. > > >> > > >> > > >> Here the problem or the bottleneck is, until and unless the query is > > >> fired on table, the caching won’t be done for the table datamaps. > > >> > > >> So consider a scenario where we are just loading the data to table for > > >> whole day and then next day we query, > > >> > > >> so all the segments will start loading into cache. So first time the > > >> query will be slow. > > >> > > >> > > >> What if we load the datamaps into cache or preprime the cache without > > >> waititng for any query on the table? > > >> > > >> Yes, what if we load the cache after every load is done, what if we load > > >> the cache for all the segments at once, > > >> > > >> so that first time query need not do all this job, which makes it faster. > > >> > > >> > > >> Here i have attached the design document for the pre-priming of cache > > >> into index server. Please have a look at it > > >> > > >> and any suggestions or inputs on this are most welcomed. > > >> > > >> > > >> > > >> https://drive.google.com/file/d/1YUpDUv7ZPUyZQQYwQYcQK2t2aBQH18PB/view?usp=sharing > > >> > > >> > > >> > > >> Regards, > > >> > > >> Akash R Nilugal > > >> > > > > > >