Re: [PROPOSAL] Change approach to store checkpoint recovery data

Alex Plehanov Thu, 09 Nov 2023 01:52:33 -0800

Anton,

My concern is not only about compatibility. The new recovery data
storing approach is not a silver bullet, it has drawbacks as well.
Also, we can't be sure that the new approach is applicable for all
environments: increased checkpoint time can lead to throttling or even
OOM in some cases. So, in my opinion, it's better to keep both
approaches and allow users to configure it. We should keep both
approaches at least for a one release after the new approach will be
enabled by default. In case of a critical problem users can raise the
issue and switch to the old approach.


пт, 3 нояб. 2023 г. в 16:33, Anton Vinogradov <a...@apache.org>:
>
> Sounds good to me, except the compatibility proposal.
> No need to keep the old behaviour. Noone will update the node after the
> crash.
> Update must happen only after the plain node stop, let's just check this
> instead of groving the code complexity.
>
> On Thu, Nov 2, 2023 at 4:55 PM Alex Plehanov <plehanov.a...@gmail.com>
> wrote:
>
> > Hello, Igniters!
> >
> > I'd like to discuss the way of storing checkpoint recovery data.
> > Now, we are writing extra data to WAL files to protect from failures
> > during checkpoints. Later, we read and write WAL files with this extra
> > data a couple of times, causing excessive disk load, which can lead to
> > performance drop.
> > We can try to improve this by changing the approach for storing
> > checkpoint recovery data. I've prepared the IEP [1] with my proposals.
> > The main idea - move checkpoint recovery data from WAL physical
> > records to some file written right before the checkpoint. Please have
> > a look at IEP for more information.
> > I've implemented PoC [2] for the described ideas. We will benchmark
> > this PoC soon and I will share the results.
> >
> > WDYT about this proposal?
> >
> > [1]:
> > https://cwiki.apache.org/confluence/display/IGNITE/IEP-113+Change+approach+to+store+checkpoint+recovery+data
> > [2]: https://github.com/apache/ignite/pull/11024/files
> >

Re: [PROPOSAL] Change approach to store checkpoint recovery data

Reply via email to