Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

Ilya Lantukh Thu, 22 Mar 2018 05:14:10 -0700

+1 for fixing LOG_ONLY. If current implementation doesn't protect from data
corruption, it doesn't make sence.


On Wed, Mar 21, 2018 at 10:38 PM, Denis Magda <[email protected]> wrote:

> +1 for the fix of LOG_ONLY
>
> On Wed, Mar 21, 2018 at 11:23 AM, Alexey Goncharuk <
> [email protected]> wrote:
>
> > +1 for fixing LOG_ONLY to enforce corruption safety given the provided
> > performance results.
> >
> > 2018-03-21 18:20 GMT+03:00 Vladimir Ozerov <[email protected]>:
> >
> > > +1 for accepting drop in LOG_ONLY. 7% is not that much and not a drop
> at
> > > all, provided that we fixing a bug. I.e. should we implement it
> correctly
> > > in the first place we would never notice any "drop".
> > > I do not understand why someone would like to use current broken mode.
> > >
> > > On Wed, Mar 21, 2018 at 6:11 PM, Dmitry Pavlov <[email protected]>
> > > wrote:
> > >
> > > > Hi, I think option 1 is better. As Val said any mode that allows
> > > corruption
> > > > does not make much sense.
> > > >
> > > > What Ivan mentioned here as drop, in relation to old mode DEFAULT
> > (FSYNC
> > > > now), is still significant perfromance boost.
> > > >
> > > > Sincerely,
> > > > Dmitriy Pavlov
> > > >
> > > > ср, 21 мар. 2018 г. в 17:56, Ivan Rakov <[email protected]>:
> > > >
> > > > > I've attached benchmark results to the JIRA ticket.
> > > > > We observe ~7% drop in "fair" LOG_ONLY_SAFE mode, independent of
> WAL
> > > > > compaction enabled flag. It's pretty significant drop: WAL
> compaction
> > > > > itself gives only ~3% drop.
> > > > >
> > > > > I see two options here:
> > > > > 1) Change LOG_ONLY behavior. That implies that we'll be ready to
> > > release
> > > > > AI 2.5 with 7% drop.
> > > > > 2) Introduce LOG_ONLY_SAFE, make it default, add release note to AI
> > 2.5
> > > > > that we added power loss durability in default mode, but user may
> > > > > fallback to previous LOG_ONLY in order to retain performance.
> > > > >
> > > > > Thoughts?
> > > > >
> > > > > Best Regards,
> > > > > Ivan Rakov
> > > > >
> > > > > On 20.03.2018 16:00, Ivan Rakov wrote:
> > > > > > Val,
> > > > > >
> > > > > >> If a storage is in
> > > > > >> corrupted state, does it mean that it needs to be completely
> > removed
> > > > and
> > > > > >> cluster needs to be restarted without data?
> > > > > >
> > > > > > Yes, there's a chance that in LOG_ONLY all local data will be
> lost,
> > > > > > but only in *power loss**/ OS crash* case.
> > > > > > kill -9, JVM crash, death of critical system thread and all other
> > > > > > cases that usually take place are variations of *process crash*.
> > All
> > > > > > WAL modes (except NONE, of course) ensure corruption-safety in
> case
> > > of
> > > > > > process crash.
> > > > > >
> > > > > >> If so, I'm not sure any mode
> > > > > >> that allows corruption makes much sense to me.
> > > > > > It depends on performance impact of enforcing power-loss
> corruption
> > > > > > safety. Price of full protection from power loss is high - FSYNC
> is
> > > > > > way slower (2-10 times) than other WAL modes. The question is
> > whether
> > > > > > ensuring weaker guarantees (corruption can't happen, but loss of
> > last
> > > > > > updates can) will affect performance as badly as strong
> guarantees.
> > > > > > I'll share benchmark results soon.
> > > > > >
> > > > > > Best Regards,
> > > > > > Ivan Rakov
> > > > > >
> > > > > > On 20.03.2018 5:09, Valentin Kulichenko wrote:
> > > > > >> Guys,
> > > > > >>
> > > > > >> What do we understand under "data corruption" here? If a storage
> > is
> > > in
> > > > > >> corrupted state, does it mean that it needs to be completely
> > removed
> > > > and
> > > > > >> cluster needs to be restarted without data? If so, I'm not sure
> > any
> > > > mode
> > > > > >> that allows corruption makes much sense to me. How am I supposed
> > to
> > > > > >> use a
> > > > > >> database, if virtually any failure can end with complete loss of
> > > data?
> > > > > >>
> > > > > >> In any case, this definitely should not be a default behavior.
> If
> > > > > >> user ever
> > > > > >> switches to corruption-unsafe mode, there should be a clear
> > warning
> > > > > >> about
> > > > > >> this.
> > > > > >>
> > > > > >> -Val
> > > > > >>
> > > > > >> On Fri, Mar 16, 2018 at 1:06 AM, Ivan Rakov <
> > [email protected]>
> > > > > >> wrote:
> > > > > >>
> > > > > >>> Ticket to track changes:
> > > > > >>> https://issues.apache.org/jira/browse/IGNITE-7754
> > > > > >>>
> > > > > >>> Best Regards,
> > > > > >>> Ivan Rakov
> > > > > >>>
> > > > > >>>
> > > > > >>> On 16.03.2018 10:58, Dmitriy Setrakyan wrote:
> > > > > >>>
> > > > > >>>> On Fri, Mar 16, 2018 at 12:55 AM, Ivan Rakov <
> > > [email protected]
> > > > >
> > > > > >>>> wrote:
> > > > > >>>>
> > > > > >>>> Vladimir,
> > > > > >>>>> Unlike BACKGROUND, LOG_ONLY provides strict write guarantees
> > > > > >>>>> unless power
> > > > > >>>>> loss has happened.
> > > > > >>>>> Seems like we need to measure performance difference to
> decide
> > > > > >>>>> whether do
> > > > > >>>>> we need separate WAL mode. If it will be invisible, we'll
> just
> > > fix
> > > > > >>>>> these
> > > > > >>>>> bugs without introducing new mode; if it will be perceptible,
> > > we'll
> > > > > >>>>> continue the discussion about introducing LOG_ONLY_SAFE.
> > > > > >>>>> Makes sense?
> > > > > >>>>>
> > > > > >>>>> Yes, this sounds like the right approach.
> > > > > >>>>
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>



-- 
Best regards,
Ilya

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

Reply via email to