Re: [PROPOSAL] Change approach to store checkpoint recovery data

Alex Plehanov Thu, 07 Dec 2023 12:26:13 -0800

Hello, Igniters!

I've published benchmark results on the IEP page [1]


Short summary: Storing recovery data on checkpoint can give
performance boost (throughput) about 3-5%% in scenarios like
IgnitePutBenchmark with default key range. In some extreme cases with
heavy disk load it can provide even much better performance. But the
downside is increased checkpoint duration up to 2x. Checkpoint buffer
pages can't be released during recovery data write, it leads to
excessive checkpoint buffer usage. High checkpoint buffer usage sooner
enables throttling to protect from checkpoint buffer overflow.
Also, there can be an issue with page replacement: if we can't find
any page to replace except pages marked for checkpoint, we can replace
such pages after the checkpoint marker is stored to disk. Previously,
we waited for the marker's future to complete, but with the new
approach there is a long time between checkpoint start and marker
store event. So with the new approach we can come to OOM exceptions in
cases when the old approach will wait a little time.

In general, more precise tuning of data regions is required with the
new approach. So, I propose to include this feature as experimental
and disabled by default for at least one release.

[1]: 
https://cwiki.apache.org/confluence/display/IGNITE/IEP-113+Change+approach+to+store+checkpoint+recovery+data

чт, 9 нояб. 2023 г. в 20:05, Anton Vinogradov <a...@apache.org>:
>
> Alex, agree to the proposal.
>
> On Thu, Nov 9, 2023 at 5:31 PM Alex Plehanov <plehanov.a...@gmail.com>
> wrote:
>
> > Anton,
> >
> > Async physical logging is a target and most promising solution.
> >
> > In this scenario:
> > 1. Implement logical and physical records split.
> > 2. Implement async physical logging (actually, already implemented as PoC).
> > 3. Drop solution, implemented in (1) after some time, if solution,
> > implemented in (2) has no critical issues.
> > We do some useless job, which we assume will be dropped soon.
> >
> > Instead, I propose:
> > 1. Implement async physical logging
> > 2. Drop old physical logging implementation if (1) has no critical
> > issues after some time.
> > 3. Or implement logical and physical records split, if critical issues
> > found in (1).
> > In this case, we proceed to the alternative approach only if the main
> > approach fails.
> >
> > чт, 9 нояб. 2023 г. в 13:18, Anton Vinogradov <a...@apache.org>:
> > >
> > > In this case, we can split logs to logical and physical at the initial
> > fix.
> > > This should not cause any negative side effects.
> > > And, then implement an async physical logging as an alternative solution?
> > >
> > > On Thu, Nov 9, 2023 at 12:52 PM Alex Plehanov <plehanov.a...@gmail.com>
> > > wrote:
> > >
> > > > Anton,
> > > >
> > > > My concern is not only about compatibility. The new recovery data
> > > > storing approach is not a silver bullet, it has drawbacks as well.
> > > > Also, we can't be sure that the new approach is applicable for all
> > > > environments: increased checkpoint time can lead to throttling or even
> > > > OOM in some cases. So, in my opinion, it's better to keep both
> > > > approaches and allow users to configure it. We should keep both
> > > > approaches at least for a one release after the new approach will be
> > > > enabled by default. In case of a critical problem users can raise the
> > > > issue and switch to the old approach.
> > > >
> > > > пт, 3 нояб. 2023 г. в 16:33, Anton Vinogradov <a...@apache.org>:
> > > > >
> > > > > Sounds good to me, except the compatibility proposal.
> > > > > No need to keep the old behaviour. Noone will update the node after
> > the
> > > > > crash.
> > > > > Update must happen only after the plain node stop, let's just check
> > this
> > > > > instead of groving the code complexity.
> > > > >
> > > > > On Thu, Nov 2, 2023 at 4:55 PM Alex Plehanov <
> > plehanov.a...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hello, Igniters!
> > > > > >
> > > > > > I'd like to discuss the way of storing checkpoint recovery data.
> > > > > > Now, we are writing extra data to WAL files to protect from
> > failures
> > > > > > during checkpoints. Later, we read and write WAL files with this
> > extra
> > > > > > data a couple of times, causing excessive disk load, which can
> > lead to
> > > > > > performance drop.
> > > > > > We can try to improve this by changing the approach for storing
> > > > > > checkpoint recovery data. I've prepared the IEP [1] with my
> > proposals.
> > > > > > The main idea - move checkpoint recovery data from WAL physical
> > > > > > records to some file written right before the checkpoint. Please
> > have
> > > > > > a look at IEP for more information.
> > > > > > I've implemented PoC [2] for the described ideas. We will benchmark
> > > > > > this PoC soon and I will share the results.
> > > > > >
> > > > > > WDYT about this proposal?
> > > > > >
> > > > > > [1]:
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/IGNITE/IEP-113+Change+approach+to+store+checkpoint+recovery+data
> > > > > > [2]: https://github.com/apache/ignite/pull/11024/files
> > > > > >
> > > >
> >

Re: [PROPOSAL] Change approach to store checkpoint recovery data

Reply via email to