[389-devel] Re: Implementing Referential Integrity by using a queue instead of a file

William Brown Thu, 06 Jul 2017 17:14:21 -0700

On Thu, 2017-07-06 at 14:33 -0400, Mark Reynolds wrote:
> 
> On 07/06/2017 01:07 PM, Ilias Stamatis wrote:
> > Hello,
> >
> > A desire had been expressed to get rid of referint plugin's logfile:
> > https://pagure.io/389-ds-base/issue/49202
> >
> > It finally turns out that this file is used for other purposes than
> > real logging.
> >
> > The referint plugin currently works like that; When the delay update
> > is set to be more than 0 a new thread is created executing referential
> > integrity code every x seconds (set by delay update). When a delete or
> > modrdn operation happens, the plugin will write that down to its
> > logfile. So, every x seconds the plugin will check the logfile, see
> > what happened and apply the changes. Finally, it deletes the file,
> > thus clearing the state for the next time it reads from it.
> >
> > After discussing this with William he suggested it's better to replace
> > the file with a queue, since the fileinvolves excess fsync / sync, and
> > has all kinds of potential state/race issues. Using a queue will be
> > much faster as well.
> >
> > William went even further and suggested that we could get rid of the
> > async referint update completely. This probably wouldn't happen soon
> > though, since likely customers are using it. For now we could provide
> > a warning such as "we recommend you set delay to 0".
> >
> > Finally, the referint-logchanges attribute does absolutely nothing. It
> > seems to be completely ignored by the plugin, so we could remove this
> > as well.
> >
> > I'll start working on these changes soon.
> >
> > Any thoughts or objections on the above would be welcome.
> The only problem with going to a queue is if the server goes down
> unexpectedly.  In such a case those RI updates would be lost.


We already have this issue because there is a delay between the change
to the object and the log being sync() to disk. So we can already lose
changes here. TBH the only fix is ot remove the async model. I actually
question why we still need async/delay processing of the refint
plugin ...

> 
> This also brings up a different point...  the RI plugin is a backend txn
> plugin.  If we write changes to a log, and those changes end up failing
> for some reason, then there is no way to rollback the original
> transaction --> breaking the backend txn plugin model.
> 
> Perhaps the log/delay should just be removed?  Or ignore the log/delay
> settings if the plugin is set as a backend txn plugin? 

Completely agree. Because of the delay, if we roll back the txn we still
do the refint check. 

I would be fully in support of removing the delay option and going betxn
for the plugin only. This delay behaviour is the reason we advise you
only run refint on one master in a topology, where if we remove this and
go betxn, we can run on all masters correctly. I think we would need to
make the plugin ignore replicated ops then too. 

My only concern would be what version to have this change land in - as
much as I'm excited to make the change we should be careful.

Perhaps we remove the delay processing, and have the "delay" process
flag act as a switch to check incoming repl ops? Because today if you
have delay > 0, you likely have refint on one master, so we need to
refint incoming repl ops. If you have delay 0, you ignore repl ops
because you assume all masters have refint?

No matter what, it's not a smooth upgrade process here, but I think long
term it's nicer to just have it on "all masters". 

-- 
Sincerely,

William Brown
Software Engineer
Red Hat, Australia/Brisbane

signature.asc
Description: This is a digitally signed message part

_______________________________________________
389-devel mailing list -- 389-devel@lists.fedoraproject.org
To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org

[389-devel] Re: Implementing Referential Integrity by using a queue instead of a file

Reply via email to