Hi, Stas! After talking with Anton and Alexy about "IP40", I changed description of implementation in form of a response to Slava, here [1]. In short, I made three separate interfaces, first public for strategy configuration, second internal for strategy implementation, and third for possible delivery of strategies from different plugins.
I will try to think about this and implement it. Warm-up phase will be up to "discovery" and while I'm not sure that it will be possible to connect via control.sh, perhaps it will be possible via jmx, but I think it will be better via control.sh > Will there be a way to interrupt warmup phase and continue startup (e.g. via > JMX, REST and/or control.sh)? Can we have it please? I was thinking about how and where to make warm-up configuration and I think it would be better to do it in IgniteConfiguration since each strategy can work for caches, groups, regions, etc. > I think that ideally warmup should be configured per-cache - I believe this > is what a user would expect to do. > However, cache configs are immutable. We need a way for existing users to > enjoy the cache warmup feature, as well as for early adopters to switch to > more > > > sophisticated strategies as they will be released (or as their > dataset grows). > Because of that I propose to add the cache warmup configuration to the > DataRegionConfiguration. Data regions can be changed between restarts, > independently > on each node allowing for a rolling change. Possible. > Will preloadPartition() method be deprecated together with this change? I > assume yes? I think it can be done as a new strategy, but this is at discretion of developers. > How hard would it be to implement a "load all indexes, metapages and > freelists" strategy in addition to the "load everything"? > I think it would be an MVP for environments with a datasets larger than RAM. > A "load everything" strategy will not work in this environments pretty much > at all, > and "load indexes" will be a significant improvement to no warmup at all. [1] - http://apache-ignite-developers.2346864.n4.nabble.com/DISCUSSION-Cache-warmup-td48582.html#a48649 04.08.2020, 23:22, "Stanislav Lukyanov" <stanlukya...@gmail.com>: > Kirill, > > Thanks for driving this. This is awaited by many users. > > A few comments and questions. > > I would keep CacheWarmup interface purely internal and never view it as an > interface which a user would be implementing. > There are multiple reasons for that: > - The logic of the cache warmup is very low-level; how a user is supposed to > know which pages they want? > - A sophisticated strategy will require accessing private APIs for sure; say, > I need a strategy which loads the last known memory state before the restart; > how can I even implement that without breaking into various internals? > - In fact there aren't many implementations which make sense ("load > everything", "load indexes", "load last memory state", "load N GB at > random"); every use case I've seen would be solved by a "load everything" > strategy (if disk is < RAM) or "load last memory state" strategy > - Warmup will be a critical phase, and a custom user implementation is all > too likely to cause issues. We should avoid executing user code in critical > stages if we can help it > To summarize, if we give warmup strategies in users' hands they will be hard > to write, will require breaking into internals or a lot of additional public > interfaces for these internals, will likely cause issues with the cluster, > and everyone will be implementing the same few general strategies. > Basically, I expect only fellow Ignite developers to be implementing their > own strategies. > Because of that I propose to keep the interfaces private, and only give a > single public parameter. The parameter can take an enum of the supported > strategies. New useful strategies should be added to Ignite codebase. > > Will there be a way to interrupt warmup phase and continue startup (e.g. via > JMX, REST and/or control.sh)? Can we have it please? > > I think that ideally warmup should be configured per-cache - I believe this > is what a user would expect to do. > However, cache configs are immutable. We need a way for existing users to > enjoy the cache warmup feature, as well as for early adopters to switch to > more sophisticated strategies as they will be released (or as their dataset > grows). > Because of that I propose to add the cache warmup configuration to the > DataRegionConfiguration. Data regions can be changed between restarts, > independently on each node allowing for a rolling change. > > Will preloadPartition() method be deprecated together with this change? I > assume yes? > > How hard would it be to implement a "load all indexes, metapages and > freelists" strategy in addition to the "load everything"? > I think it would be an MVP for environments with a datasets larger than RAM. > A "load everything" strategy will not work in this environments pretty much > at all, and "load indexes" will be a significant improvement to no warmup at > all. > > Thanks, > Stan > >> On 4 Aug 2020, at 16:04, ткаленко кирилл <tkalkir...@yandex.ru> wrote: >> >> Hi, Denis! >> >> For now, I suggest a simple warm-up implementation, if the persistent >> storage is less than RAM. If others want to make additional implementations, >> they can do it themselves by implementing interfaces. For the first point, >> we need to figure out how and where we will remember pages, etc. Perhaps for >> such tasks it will be necessary to make improvements in kernel. >> >> In "WarmUpStrategy#warmUp" method, we get "GridKernalContext#cache" from >> which we can get with caches and groups through >> "GridCacheProcessor#cacheGroups", "GridCacheProcessor#caches" and so on, we >> can access to pages. >>> The second one requires direct work with data pages, but not with a cache >>> context, so it's also impossible to implement. >> >> This requires writing additional custom code, which may run longer due to >> its SQL features, and so on. >> It would be more convenient to just set a warm-up strategy for both >> developer and grid administrator. >>> When loading of all cache data is required, it can be done by running a >>> local scan query. It will iterate through all data pages and result in >>> their allocation in memory. >> >> 04.08.2020, 15:25, "Denis Mekhanikov" <dmekhani...@gmail.com>: >>> Kirill, >>> >>> When I discussed this functionality with Ignite users, I heard the >>> following thoughts about warming up: >>> >>> - Node restarts affect performance of queries. The main reason for that >>> is that the pages that were loaded into memory before the restart are on >>> disk after the restart. It takes time to reach the same distribution of >>> data between memory and disk. Until that point the performance is >>> usually >>> degraded. No simple rule like "load everything" helps here if only a >>> part >>> of data fits in memory. >>> - It would be nice to have a way to give preferences to indices when >>> doing a warmup. Usually indices are used more often than data nodes, so >>> loading indices first would bring more benefits. >>> >>> The first point can be addressed by implementing the policy that would >>> restore the memory state that was observed before the restart. I don't see >>> how it can be implemented using the suggested interface. >>> The second one requires direct work with data pages, but not with a cache >>> context, so it's also impossible to implement. >>> >>> When loading of all cache data is required, it can be done by running a >>> local scan query. It will iterate through all data pages and result in >>> their allocation in memory. >>> >>> So, I don't really see a scenario when the suggested API will help. Do you >>> have a suitable use-case that will be covered? >>> >>> Denis >>> >>> вт, 4 авг. 2020 г. в 13:42, ткаленко кирилл <tkalkir...@yandex.ru>: >>> >>>> Hi, Denis! >>>> >>>> Previously, I answered Slava about implementation that I keep in mind, >>>> now >>>> it will be possible to add own warm-up strategy implementations. Which >>>> will >>>> be possible to implement in different ways. >>>> >>>> At the moment, I suggest implementing one "Load all" strategy, which will >>>> be effective if persistent storage is less than RAM. >>>> >>>> 28.07.2020, 19:46, "Denis Mekhanikov" <dmekhani...@gmail.com>: >>>> > Kirill, >>>> > >>>> > That will be a great feature! Other popular databases already have it >>>> (e.g. >>>> > Postgres: https://www.postgresql.org/docs/11/pgprewarm.html), so it's >>>> good >>>> > that we're also going to have it in Ignite. >>>> > >>>> > What implementation of CacheWarmup interface do you have in mind? Will >>>> > there be some preconfigured implementation, and will users be able to >>>> > implement it themselves? >>>> > >>>> > Do you think it should be cache-based? I would say that a >>>> DataRegion-based >>>> > warm-up would come more naturally. Page IDs that are loaded into the >>>> data >>>> > region can be dumped periodically to disk and recovered on restarts. >>>> This >>>> > is more or less how it works in Postgres. >>>> > I'm afraid that if we make it cache-based, the implementation won't be >>>> that >>>> > obvious. We already have an API for warmup that appeared to be pretty >>>> much >>>> > impossible to apply in a useful way: >>>> > >>>> >>>> https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/IgniteCache.html#preloadPartition-int- >>>> > Let's make sure that our new tool for warming up is actually useful. >>>> > >>>> > Denis >>>> > >>>> > вт, 28 июл. 2020 г. в 09:17, Zhenya Stanilovsky >>>> <arzamas...@mail.ru.invalid >>>> >> : >>>> > >>>> >> Looks like we need additional func for static caches, for >>>> >> example: warmup(List<CacheConfiguration> cconf) it would be helpful >>>> for >>>> >> spring too. >>>> >> >>>> >> > >>>> >> >------- Forwarded message ------- >>>> >> >From: "Вячеслав Коптилин" < slava.kopti...@gmail.com > >>>> >> >To: dev@ignite.apache.org >>>> >> >Cc: >>>> >> >Subject: Re: [DISCUSSION] Cache warmup >>>> >> >Date: Mon, 27 Jul 2020 16:47:48 +0300 >>>> >> > >>>> >> >Hello Kirill, >>>> >> > >>>> >> >Thanks a lot for driving this activity. If I am not mistaken, this >>>> >> >discussion relates to IEP-40. >>>> >> > >>>> >> >> I suggest adding a warmup phase after recovery here [1] after [2], >>>> >> before >>>> >> >discovery. >>>> >> >This means that the user's thread, which starts Ignite via >>>> >> >Ignition.start(), will wait for ana additional step - cache warm-up. >>>> >> >I think this fact has to be clearly mentioned in our documentation >>>> (at >>>> >> >Javadocat least) because this step can be time-consuming. >>>> >> > >>>> >> >> I suggest adding a new interface: >>>> >> >I would change it a bit. First of all, it would be nice to place this >>>> >> >interface to a public package and get rid of using GridCacheContext, >>>> >> >which is an internal class and it should not leak to the public API >>>> in any >>>> >> >case. >>>> >> >Perhaps, this parameter is not needed at all or we should add some >>>> public >>>> >> >abstraction instead of internal class. >>>> >> > >>>> >> >package org.apache.ignite.configuration; >>>> >> > >>>> >> >import org.apache.ignite.IgniteCheckedException; >>>> >> >import org.apache.ignite.lang.IgniteFuture; >>>> >> > >>>> >> >public interface CacheWarmupper { >>>> >> > /** >>>> >> > * Warmup cache. >>>> >> > * >>>> >> > * @param cachename Cache name. >>>> >> > * @return Future cache warmup. >>>> >> > * @throws IgniteCheckedException If failed. >>>> >> > */ >>>> >> > IgniteFuture<?> warmup(String cachename) throws >>>> >> >IgniteCheckedException; >>>> >> >} >>>> >> > >>>> >> >Thanks, >>>> >> >S. >>>> >> > >>>> >> >пн, 27 июл. 2020 г. в 15:03, ткаленко кирилл < tkalkir...@yandex.ru >>>> >: >>>> >> > >>>> >> >> Now, after restarting node, we have only cold caches, which at >>>> first >>>> >> >> requests to them will gradually load data from disks, which can >>>> slow >>>> >> down >>>> >> >> first calls to them. >>>> >> >> If node has more RAM than data on disk, then they can be loaded at >>>> start >>>> >> >> "warmup", thereby solving the issue of slowdowns during first calls >>>> to >>>> >> >> caches. >>>> >> >> >>>> >> >> I suggest adding a warmup phase after recovery here [1] after [2], >>>> >> before >>>> >> >> descovery. >>>> >> >> >>>> >> >> I suggest adding a new interface: >>>> >> >> >>>> >> >> package org.apache.ignite.internal.processors.cache; >>>> >> >> >>>> >> >> import org.apache.ignite.IgniteCheckedException; >>>> >> >> import org.apache.ignite.internal.IgniteInternalFuture; >>>> >> >> import org.jetbrains.annotations.Nullable; >>>> >> >> >>>> >> >> /** >>>> >> >> * Interface for warming up cache. >>>> >> >> */ >>>> >> >> public interface CacheWarmup { >>>> >> >> /** >>>> >> >> * Warmup cache. >>>> >> >> * >>>> >> >> * @param cacheCtx Cache context. >>>> >> >> * @return Future cache warmup. >>>> >> >> * @throws IgniteCheckedException if failed. >>>> >> >> */ >>>> >> >> @Nullable IgniteInternalFuture<?> process(GridCacheContext >>>> cacheCtx) >>>> >> >> throws IgniteCheckedException; >>>> >> >> } >>>> >> >> >>>> >> >> Which will allow to warm up caches in parallel and asynchronously. >>>> >> Warmup >>>> >> >> phase will end after all IgniteInternalFuture for all caches >>>> isDone. >>>> >> >> >>>> >> >> Also adding the ability to customize via methods: >>>> >> >> >>>> >> >>>> >>>> org.apache.ignite.configuration.IgniteConfiguration#setDefaultCacheWarmup >>>> >> >> org.apache.ignite.configuration.CacheConfiguration#setCacheWarmup >>>> >> >> >>>> >> >> Which will allow for each cache to set implementation of cache >>>> warming >>>> >> >> up, >>>> >> >> both for a specific cache, and for all if necessary. >>>> >> >> >>>> >> >> I suggest adding an implementation of SequentialWarmup that will >>>> use >>>> >> [3]. >>>> >> >> >>>> >> >> Questions, suggestions, comments? >>>> >> >> >>>> >> >> [1] - >>>> >> >> >>>> >> >>>> >>>> org.apache.ignite.internal.processors.cache.GridCacheProcessor.CacheRecoveryLifecycle#afterLogicalUpdatesApplied >>>> >> >> [2] - >>>> >> >> >>>> >> >>>> >>>> org.apache.ignite.internal.processors.cache.GridCacheProcessor.CacheRecoveryLifecycle#restorePartitionStates >>>> >> >> [3] - >>>> >> >> >>>> >> >>>> >>>> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManager.CacheDataStore#preload