Hi, Stas!

After talking with Anton and Alexy about "IP40", I changed description of 
implementation in form of a response to Slava, here [1]. In short, I made three 
separate interfaces, first public for strategy configuration, second internal 
for strategy implementation, and third for possible delivery of strategies from 
different plugins.

I will try to think about this and implement it. Warm-up phase will be up to 
"discovery" and while I'm not sure that it will be possible to connect via 
control.sh, perhaps it will be possible via jmx, but I think it will be better 
via control.sh
> Will there be a way to interrupt warmup phase and continue startup (e.g. via 
> JMX, REST and/or control.sh)? Can we have it please?

I was thinking about how and where to make warm-up configuration and I think it 
would be better to do it in IgniteConfiguration since each strategy can work 
for caches, groups, regions, etc.
> I think that ideally warmup should be configured per-cache - I believe this 
> is what a user would expect to do.
> However, cache configs are immutable. We need a way for existing users to 
> enjoy the cache warmup feature, as well as for early adopters to switch to 
> more > > > sophisticated strategies as they will be released (or as their 
> dataset grows).
> Because of that I propose to add the cache warmup configuration to the 
> DataRegionConfiguration. Data regions can be changed between restarts, 
> independently > on each node allowing for a rolling change.

Possible.
> Will preloadPartition() method be deprecated together with this change? I 
> assume yes?

I think it can be done as a new strategy, but this is at discretion of 
developers.
> How hard would it be to implement a "load all indexes, metapages and 
> freelists" strategy in addition to the "load everything"?
> I think it would be an MVP for environments with a datasets larger than RAM. 
> A "load everything" strategy will not work in this environments pretty much 
> at all, 
> and "load indexes" will be a significant improvement to no warmup at all.

[1] - 
http://apache-ignite-developers.2346864.n4.nabble.com/DISCUSSION-Cache-warmup-td48582.html#a48649


04.08.2020, 23:22, "Stanislav Lukyanov" <stanlukya...@gmail.com>:
> Kirill,
>
> Thanks for driving this. This is awaited by many users.
>
> A few comments and questions.
>
> I would keep CacheWarmup interface purely internal and never view it as an 
> interface which a user would be implementing.
> There are multiple reasons for that:
> - The logic of the cache warmup is very low-level; how a user is supposed to 
> know which pages they want?
> - A sophisticated strategy will require accessing private APIs for sure; say, 
> I need a strategy which loads the last known memory state before the restart; 
> how can I even implement that without breaking into various internals?
> - In fact there aren't many implementations which make sense ("load 
> everything", "load indexes", "load last memory state", "load N GB at 
> random"); every use case I've seen would be solved by a "load everything" 
> strategy (if disk is < RAM) or "load last memory state" strategy
> - Warmup will be a critical phase, and a custom user implementation is all 
> too likely to cause issues. We should avoid executing user code in critical 
> stages if we can help it
> To summarize, if we give warmup strategies in users' hands they will be hard 
> to write, will require breaking into internals or a lot of additional public 
> interfaces for these internals, will likely cause issues with the cluster, 
> and everyone will be implementing the same few general strategies.
> Basically, I expect only fellow Ignite developers to be implementing their 
> own strategies.
> Because of that I propose to keep the interfaces private, and only give a 
> single public parameter. The parameter can take an enum of the supported 
> strategies. New useful strategies should be added to Ignite codebase.
>
> Will there be a way to interrupt warmup phase and continue startup (e.g. via 
> JMX, REST and/or control.sh)? Can we have it please?
>
> I think that ideally warmup should be configured per-cache - I believe this 
> is what a user would expect to do.
> However, cache configs are immutable. We need a way for existing users to 
> enjoy the cache warmup feature, as well as for early adopters to switch to 
> more sophisticated strategies as they will be released (or as their dataset 
> grows).
> Because of that I propose to add the cache warmup configuration to the 
> DataRegionConfiguration. Data regions can be changed between restarts, 
> independently on each node allowing for a rolling change.
>
> Will preloadPartition() method be deprecated together with this change? I 
> assume yes?
>
> How hard would it be to implement a "load all indexes, metapages and 
> freelists" strategy in addition to the "load everything"?
> I think it would be an MVP for environments with a datasets larger than RAM. 
> A "load everything" strategy will not work in this environments pretty much 
> at all, and "load indexes" will be a significant improvement to no warmup at 
> all.
>
> Thanks,
> Stan
>
>>  On 4 Aug 2020, at 16:04, ткаленко кирилл <tkalkir...@yandex.ru> wrote:
>>
>>  Hi, Denis!
>>
>>  For now, I suggest a simple warm-up implementation, if the persistent 
>> storage is less than RAM. If others want to make additional implementations, 
>> they can do it themselves by implementing interfaces. For the first point, 
>> we need to figure out how and where we will remember pages, etc. Perhaps for 
>> such tasks it will be necessary to make improvements in kernel.
>>
>>  In "WarmUpStrategy#warmUp" method, we get "GridKernalContext#cache" from 
>> which we can get with caches and groups through 
>> "GridCacheProcessor#cacheGroups", "GridCacheProcessor#caches" and so on, we 
>> can access to pages.
>>>  The second one requires direct work with data pages, but not with a cache
>>>  context, so it's also impossible to implement.
>>
>>  This requires writing additional custom code, which may run longer due to 
>> its SQL features, and so on.
>>  It would be more convenient to just set a warm-up strategy for both 
>> developer and grid administrator.
>>>  When loading of all cache data is required, it can be done by running a
>>>  local scan query. It will iterate through all data pages and result in
>>>  their allocation in memory.
>>
>>  04.08.2020, 15:25, "Denis Mekhanikov" <dmekhani...@gmail.com>:
>>>  Kirill,
>>>
>>>  When I discussed this functionality with Ignite users, I heard the
>>>  following thoughts about warming up:
>>>
>>>     - Node restarts affect performance of queries. The main reason for that
>>>     is that the pages that were loaded into memory before the restart are on
>>>     disk after the restart. It takes time to reach the same distribution of
>>>     data between memory and disk. Until that point the performance is 
>>> usually
>>>     degraded. No simple rule like "load everything" helps here if only a 
>>> part
>>>     of data fits in memory.
>>>     - It would be nice to have a way to give preferences to indices when
>>>     doing a warmup. Usually indices are used more often than data nodes, so
>>>     loading indices first would bring more benefits.
>>>
>>>  The first point can be addressed by implementing the policy that would
>>>  restore the memory state that was observed before the restart. I don't see
>>>  how it can be implemented using the suggested interface.
>>>  The second one requires direct work with data pages, but not with a cache
>>>  context, so it's also impossible to implement.
>>>
>>>  When loading of all cache data is required, it can be done by running a
>>>  local scan query. It will iterate through all data pages and result in
>>>  their allocation in memory.
>>>
>>>  So, I don't really see a scenario when the suggested API will help. Do you
>>>  have a suitable use-case that will be covered?
>>>
>>>  Denis
>>>
>>>  вт, 4 авг. 2020 г. в 13:42, ткаленко кирилл <tkalkir...@yandex.ru>:
>>>
>>>>   Hi, Denis!
>>>>
>>>>   Previously, I answered Slava about implementation that I keep in mind, 
>>>> now
>>>>   it will be possible to add own warm-up strategy implementations. Which 
>>>> will
>>>>   be possible to implement in different ways.
>>>>
>>>>   At the moment, I suggest implementing one "Load all" strategy, which will
>>>>   be effective if persistent storage is less than RAM.
>>>>
>>>>   28.07.2020, 19:46, "Denis Mekhanikov" <dmekhani...@gmail.com>:
>>>>   > Kirill,
>>>>   >
>>>>   > That will be a great feature! Other popular databases already have it
>>>>   (e.g.
>>>>   > Postgres: https://www.postgresql.org/docs/11/pgprewarm.html), so it's
>>>>   good
>>>>   > that we're also going to have it in Ignite.
>>>>   >
>>>>   > What implementation of CacheWarmup interface do you have in mind? Will
>>>>   > there be some preconfigured implementation, and will users be able to
>>>>   > implement it themselves?
>>>>   >
>>>>   > Do you think it should be cache-based? I would say that a
>>>>   DataRegion-based
>>>>   > warm-up would come more naturally. Page IDs that are loaded into the 
>>>> data
>>>>   > region can be dumped periodically to disk and recovered on restarts. 
>>>> This
>>>>   > is more or less how it works in Postgres.
>>>>   > I'm afraid that if we make it cache-based, the implementation won't be
>>>>   that
>>>>   > obvious. We already have an API for warmup that appeared to be pretty
>>>>   much
>>>>   > impossible to apply in a useful way:
>>>>   >
>>>>   
>>>> https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/IgniteCache.html#preloadPartition-int-
>>>>   > Let's make sure that our new tool for warming up is actually useful.
>>>>   >
>>>>   > Denis
>>>>   >
>>>>   > вт, 28 июл. 2020 г. в 09:17, Zhenya Stanilovsky
>>>>   <arzamas...@mail.ru.invalid
>>>>   >> :
>>>>   >
>>>>   >> Looks like we need additional func for static caches, for
>>>>   >> example: warmup(List<CacheConfiguration> cconf) it would be helpful 
>>>> for
>>>>   >> spring too.
>>>>   >>
>>>>   >> >
>>>>   >> >------- Forwarded message -------
>>>>   >> >From: "Вячеслав Коптилин" < slava.kopti...@gmail.com >
>>>>   >> >To: dev@ignite.apache.org
>>>>   >> >Cc:
>>>>   >> >Subject: Re: [DISCUSSION] Cache warmup
>>>>   >> >Date: Mon, 27 Jul 2020 16:47:48 +0300
>>>>   >> >
>>>>   >> >Hello Kirill,
>>>>   >> >
>>>>   >> >Thanks a lot for driving this activity. If I am not mistaken, this
>>>>   >> >discussion relates to IEP-40.
>>>>   >> >
>>>>   >> >> I suggest adding a warmup phase after recovery here [1] after [2],
>>>>   >> before
>>>>   >> >discovery.
>>>>   >> >This means that the user's thread, which starts Ignite via
>>>>   >> >Ignition.start(), will wait for ana additional step - cache warm-up.
>>>>   >> >I think this fact has to be clearly mentioned in our documentation 
>>>> (at
>>>>   >> >Javadocat least) because this step can be time-consuming.
>>>>   >> >
>>>>   >> >> I suggest adding a new interface:
>>>>   >> >I would change it a bit. First of all, it would be nice to place this
>>>>   >> >interface to a public package and get rid of using GridCacheContext,
>>>>   >> >which is an internal class and it should not leak to the public API
>>>>   in any
>>>>   >> >case.
>>>>   >> >Perhaps, this parameter is not needed at all or we should add some
>>>>   public
>>>>   >> >abstraction instead of internal class.
>>>>   >> >
>>>>   >> >package org.apache.ignite.configuration;
>>>>   >> >
>>>>   >> >import org.apache.ignite.IgniteCheckedException;
>>>>   >> >import org.apache.ignite.lang.IgniteFuture;
>>>>   >> >
>>>>   >> >public interface CacheWarmupper {
>>>>   >> > /**
>>>>   >> > * Warmup cache.
>>>>   >> > *
>>>>   >> > * @param cachename Cache name.
>>>>   >> > * @return Future cache warmup.
>>>>   >> > * @throws IgniteCheckedException If failed.
>>>>   >> > */
>>>>   >> > IgniteFuture<?> warmup(String cachename) throws
>>>>   >> >IgniteCheckedException;
>>>>   >> >}
>>>>   >> >
>>>>   >> >Thanks,
>>>>   >> >S.
>>>>   >> >
>>>>   >> >пн, 27 июл. 2020 г. в 15:03, ткаленко кирилл < tkalkir...@yandex.ru
>>>>   >:
>>>>   >> >
>>>>   >> >> Now, after restarting node, we have only cold caches, which at 
>>>> first
>>>>   >> >> requests to them will gradually load data from disks, which can 
>>>> slow
>>>>   >> down
>>>>   >> >> first calls to them.
>>>>   >> >> If node has more RAM than data on disk, then they can be loaded at
>>>>   start
>>>>   >> >> "warmup", thereby solving the issue of slowdowns during first calls
>>>>   to
>>>>   >> >> caches.
>>>>   >> >>
>>>>   >> >> I suggest adding a warmup phase after recovery here [1] after [2],
>>>>   >> before
>>>>   >> >> descovery.
>>>>   >> >>
>>>>   >> >> I suggest adding a new interface:
>>>>   >> >>
>>>>   >> >> package org.apache.ignite.internal.processors.cache;
>>>>   >> >>
>>>>   >> >> import org.apache.ignite.IgniteCheckedException;
>>>>   >> >> import org.apache.ignite.internal.IgniteInternalFuture;
>>>>   >> >> import org.jetbrains.annotations.Nullable;
>>>>   >> >>
>>>>   >> >> /**
>>>>   >> >> * Interface for warming up cache.
>>>>   >> >> */
>>>>   >> >> public interface CacheWarmup {
>>>>   >> >> /**
>>>>   >> >> * Warmup cache.
>>>>   >> >> *
>>>>   >> >> * @param cacheCtx Cache context.
>>>>   >> >> * @return Future cache warmup.
>>>>   >> >> * @throws IgniteCheckedException if failed.
>>>>   >> >> */
>>>>   >> >> @Nullable IgniteInternalFuture<?> process(GridCacheContext 
>>>> cacheCtx)
>>>>   >> >> throws IgniteCheckedException;
>>>>   >> >> }
>>>>   >> >>
>>>>   >> >> Which will allow to warm up caches in parallel and asynchronously.
>>>>   >> Warmup
>>>>   >> >> phase will end after all IgniteInternalFuture for all caches 
>>>> isDone.
>>>>   >> >>
>>>>   >> >> Also adding the ability to customize via methods:
>>>>   >> >>
>>>>   >>
>>>>    
>>>> org.apache.ignite.configuration.IgniteConfiguration#setDefaultCacheWarmup
>>>>   >> >> org.apache.ignite.configuration.CacheConfiguration#setCacheWarmup
>>>>   >> >>
>>>>   >> >> Which will allow for each cache to set implementation of cache
>>>>   warming
>>>>   >> >> up,
>>>>   >> >> both for a specific cache, and for all if necessary.
>>>>   >> >>
>>>>   >> >> I suggest adding an implementation of SequentialWarmup that will 
>>>> use
>>>>   >> [3].
>>>>   >> >>
>>>>   >> >> Questions, suggestions, comments?
>>>>   >> >>
>>>>   >> >> [1] -
>>>>   >> >>
>>>>   >>
>>>>    
>>>> org.apache.ignite.internal.processors.cache.GridCacheProcessor.CacheRecoveryLifecycle#afterLogicalUpdatesApplied
>>>>   >> >> [2] -
>>>>   >> >>
>>>>   >>
>>>>    
>>>> org.apache.ignite.internal.processors.cache.GridCacheProcessor.CacheRecoveryLifecycle#restorePartitionStates
>>>>   >> >> [3] -
>>>>   >> >>
>>>>   >>
>>>>    
>>>> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManager.CacheDataStore#preload

Reply via email to