Hi,
I suspect that Kees discovered a new bug ...

As the retro changelog is persistent across reboot, it is normal that their
timestamp are absolute, so IMHO there is no way relative time should be
used/seen in trimming computation.
(in other words cur_time is wrong. )
My suspicion is that we got a regression when the slapi_eq timers where
rewritten to use relative time instead of absolute time) (cur_time is
provided by slapi_eq API and is now relative ( it should rather be
initialized to time(0) )
The funny thing is that nobody have seen that retrocl trimming was broken
for so long ...

Regards Pierre,

On Wed, Aug 11, 2021 at 11:50 AM Kees Bakker <[email protected]> wrote:

> On 09-08-2021 20:23, Mark Reynolds wrote:
> > On 8/9/21 1:47 PM, Kees Bakker wrote:
> >> On 09-08-2021 19:25, Mark Reynolds wrote:
> >>> On 8/9/21 1:16 PM, Kees Bakker wrote:
> >>>> On 09-08-2021 18:43, Mark Reynolds wrote:
> >>>>> On 8/9/21 11:20 AM, Kees Bakker wrote:
> >>>>>> On 09-08-2021 16:00, Mark Reynolds wrote:
> >>>>>>> On 8/9/21 8:09 AM, Kees Bakker wrote:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> When my dirsrv was trying to compact the databases I was getting
> >>>>>>>> this
> >>>>>>>> error
> >>>>>>>>
> >>>>>>>> [07/Aug/2021:23:59:02.715984489 +0200] - NOTICE - bdb_compact -
> >>>>>>>> Compacting databases ...
> >>>>>>>> [07/Aug/2021:23:59:02.765932397 +0200] - NOTICE - bdb_compact -
> >>>>>>>> Compacting DB start: userRoot
> >>>>>>>> [07/Aug/2021:23:59:03.518175414 +0200] - NOTICE - bdb_compact -
> >>>>>>>> compactdb: compact userRoot - 417 pages freed
> >>>>>>>> [07/Aug/2021:23:59:03.576427786 +0200] - NOTICE - bdb_compact -
> >>>>>>>> Compacting DB start: ipaca
> >>>>>>>> [07/Aug/2021:23:59:03.659941533 +0200] - NOTICE - bdb_compact -
> >>>>>>>> compactdb: compact ipaca - 419 pages freed
> >>>>>>>> [07/Aug/2021:23:59:03.718445310 +0200] - NOTICE - bdb_compact -
> >>>>>>>> Compacting DB start: changelog
> >>>>>>>> [08/Aug/2021:00:00:40.807571334 +0200] - NOTICE -
> >>>>>>>> NSMMReplicationPlugin - changelog program - cl5CompactDBs -
> >>>>>>>> compacting
> >>>>>>>> replication changelogs...
> >>>>>>>> [08/Aug/2021:00:00:54.309357211 +0200] - ERR - libdb - BDB2055
> Lock
> >>>>>>>> table is out of available lock entries
> >>>>>>>> [08/Aug/2021:00:00:54.726504736 +0200] - ERR - bdb_compact -
> >>>>>>>> compactdb: failed to compact changelog; db error - 12 Cannot
> >>>>>>>> allocate
> >>>>>>>> memory
> >>>>>>>> [08/Aug/2021:00:00:54.801571421 +0200] - ERR - libdb - BDB2055
> Lock
> >>>>>>>> table is out of available lock entries
> >>>>>>>> [08/Aug/2021:00:00:54.876618702 +0200] - ERR -
> >>>>>>>> NSMMReplicationPlugin -
> >>>>>>>> changelog program - cl5CompactDBs - Failed to compact
> >>>>>>>> a797bb0b-be1d11eb-88c0b677-613aa2ad; db error - 12 Cannot allocate
> >>>>>>>> memory
> >>>>>>>> [08/Aug/2021:00:00:57.253006449 +0200] - NOTICE - bdb_compact -
> >>>>>>>> Compacting databases finished.
> >>>>>>>>
> >>>>>>>> There are about 402k entries in cn=changelog.
> >>>>>>>>
> >>>>>>>> I have a few questions
> >>>>>>>> 1) is it normal to have so many entries in cn=changelog? On
> another
> >>>>>>>> replica I have almost 3M entries Isn't this cleaned up?
> >>>>>>>> 2) the number of locks is 50000 (there are two config items).
> >>>>>>>> Should I
> >>>>>>>> increase that number? If so, increase to what?
> >>>>>>>> 3) is there maybe something else going on, causing the
> >>>>>>>> exhaustion of
> >>>>>>>> the locks?
> >>>>>>>
> >>>>>>> Ok, so by default there is no changelog trimming enabled. So the
> >>>>>>> changelog will grow without bounds, which is bad.
> >>>>>>
> >>>>>> How much of this [1] applies? Indeed it says "By default the
> >>>>>> Replication Changelog does not use any trimming by default."
> >>>>>> So, how is the trimming actually enabled? I have these entries
> >>>>>>
> >>>>>> dn: cn=changelog5,cn=config
> >>>>>> cn: changelog5
> >>>>>> nsslapd-changelogdir: /var/lib/dirsrv/slapd-EXAMPLE-COM/cldb
> >>>>>> nsslapd-changelogmaxage: 30d
> >>>>>> objectClass: top
> >>>>>> objectClass: extensibleobject
> >>>>>
> >>>>> So trimming is set, but it's set to 30 days (30d), that
> >>>>> could/should be
> >>>>> shortened to "7d".
> >>>>
> >>>> Somehow I doubt that this will do anything.
> >>>
> >>> Ahhh, I thought this was the Replication Changelog, but it is the
> >>> "Retro" Changelog.  Ok so check this out:
> >>>
> >>>
> https://access.redhat.com/documentation/en-us/red_hat_directory_server/11/html/administration_guide/managing_replication-using_the_retro_changelog_plug_in#Using_the_Retro_Changelog_Plug_in-Trimming_the_Retro_Changelog
> >>>
> >>>
> >>> Or update config directly using ldapmodify:
> >>>
> >>> # ldapmodify -D ...
> >>> dn: cn=Retro Changelog Plugin,cn=plugins,cn=config
> >>> changetype: modify
> >>> replace: nsslapd-changelogmaxage: 7d
> >>> nsslapd-changelogmaxage: 7d
> >>
> >> (( Typo? The replace line is without the value ))
> > yes, sorry...
> >>
> >> Hmm, that maxage was already at 2 days
> >
> > Ok, perhaps you are running into a known bug where indexes created by
> > plugins are not enabled.  Or might be a config issue?
> >
> > Try this first.  Set in the retro changelog entry:
> >
> > nsslapd-changelog-trim-interval: 300   <--- 5 minutes
> (Sorry to be picky, but it is nsslapd-changelogtrim-interval. It's default
> is 300.)
> >
> > Restart the server to make sure it was applied.  You can enable "plugin"
> > logging to see what it does.
> >
> > nsslapd-errorlog-level: 65536
> >
> > wait over 5 minutes, then grep the errors log for "DSRetroclPlugin".
> > This will tell us what the plugin is doing when it goes to trim the
> > database.
> >
> > IMPORTANT, disable plugin logging immediately after you get your data.
> > Set the errorlog level to 0 to go back to default logging.
> >
> > If the logging shows it's not finding any candidates then it might be
> > the indexing issue I mentioned above.  In that case you just need to
> > reindex the retro changelog:
>
> Before changing anything I enabled Plug-in debugging which revealed
> something interesting
>
> [root@linge ~]# grep trim
> /var/log/dirsrv/slapd-GHS-NL/errors-plugin-debug-20210811-1108
> [11/Aug/2021:11:01:49.792450548 +0200] - DEBUG - DSRetroclPlugin - cltrim:
> ldrc=0, first_time=1622032141, cur_time=3702600
> [11/Aug/2021:11:01:49.842339467 +0200] - DEBUG - DSRetroclPlugin -
> retrocl_housekeeping - changelog does not need to be trimmed
> [11/Aug/2021:11:06:49.953401407 +0200] - DEBUG - DSRetroclPlugin - cltrim:
> ldrc=0, first_time=1622032141, cur_time=3702900
> [11/Aug/2021:11:06:50.012249703 +0200] - DEBUG - DSRetroclPlugin -
> retrocl_housekeeping - changelog does not need to be trimmed
>
> This shows that the interval is 300 (even if
> nsslapd-changelogtrim-interval is not explicitly set in
> cn=changelog5,cn=config
>
> Look at cur_time. That is the "uptime" of the system, not UTC.
>
> [root@linge ~]# uptime
>   11:21:42 up 42 days, 20:49,  2 users,  load average: 0.15, 0.14, 0.16
>
> Next, assuming this code from retrocl_trim.c is making the decision to trim
>
>              first_time = retrocl_getchangetime(SLAPI_SEQ_FIRST, &ldrc);
>              slapi_log_err(SLAPI_LOG_PLUGIN, RETROCL_PLUGIN_NAME,
>                            "cltrim: ldrc=%d, first_time=%ld,
> cur_time=%ld\n",
>                            ldrc, first_time, cur_time);
>              if (LDAP_SUCCESS == ldrc && first_time > (time_t)0L &&
>                  first_time + ts.ts_c_max_age < cur_time) {
>                  must_trim = 1;
>              }
>
> There is a mixup of absolute (first_time) and relative (cur_time) time.
> Which of the two is wrong?
>
> >
> > # dsctl slapd-YOUR_INSTANCE stop
> >
> > # dsctl slapd-YOUR_INSTANCE db2index changelog
> >
> > # dsctl slapd-YOUR_INSTANCE start
> >
> > Then wait 5 minutes and see if anything was trimmed.
> >
> > Mark
> >
> >>
> >> dn: cn=Retro Changelog Plugin,cn=plugins,cn=config
> >> cn: Retro Changelog Plugin
> >> nsslapd-attribute: nsuniqueid:targetUniqueId
> >> nsslapd-changelogmaxage: 2d
> >> nsslapd-include-suffix: cn=dns,dc=example,dc=com
> >> nsslapd-plugin-depends-on-named: Class of Service
> >> nsslapd-plugin-depends-on-type: database
> >> nsslapd-pluginDescription: Retrocl Plugin
> >> nsslapd-pluginEnabled: on
> >> nsslapd-pluginId: retrocl
> >> nsslapd-pluginInitfunc: retrocl_plugin_init
> >> nsslapd-pluginPath: libretrocl-plugin
> >> nsslapd-pluginType: object
> >> nsslapd-pluginVendor: 389 Project
> >> nsslapd-pluginVersion: 1.4.3.231
> >> nsslapd-pluginbetxn: on
> >> nsslapd-pluginprecedence: 25
> >> objectClass: top
> >> objectClass: nsSlapdPlugin
> >> objectClass: extensibleObject
> >>
> >>>
> >>>
> >>> HTH,
> >>>
> >>> Mark
> >>>
> >>>>
> >>>> The first entry in cn=changelog is more than two months old. (May 26th
> >>>> is about
> >>>> the time I installed this server as a FreeIPA replica.)
> >>>>
> >>>> dn: changenumber=1,cn=changelog
> >>>> objectClass: top
> >>>> objectClass: changelogentry
> >>>> objectClass: extensibleObject
> >>>> targetuniqueid: 1e89ebcb-e20111e8-8820f96e-fc0c60a4
> >>>> changeNumber: 1
> >>>> targetDn:
> >>>> idnsname=_ldap._tcp,idnsname=example.com.,cn=dns,dc=example,dc=com
> >>>> changeTime: 20210526122901Z
> >>>> changeType: modify
> >>>> changes:: YWRkOiBzUlZSZWNvc...gplbnRyeXVzbjogMTEyCi0KAA==
> >>>>
> >>>> Is there a way to see trimming in action? I'm afraid it is not doing
> >>>> anything.
> >>
> >> What diagnostic can I enable to see the trimming in action?
> >>
> >>>>
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> I recommend setting up the changelog max age to 7 days:
> >>>>>>>
> >>>>>>>
> https://access.redhat.com/documentation/en-us/red_hat_directory_server/11/html/administration_guide/trimming_the_replication_changelog
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Once that the trimming cleans up the changelog the database
> >>>>>>> compaction
> >>>>>>> should work without issue.  You can also increase the database
> >>>>>>> locks to
> >>>>>>> 1 million (that does require a restart of the server to take
> >>>>>>> effect).
> >>>>>>
> >>>>>> Let's hope that 1 million is enough :-)
> >>>>>
> >>>>> It might not be, you can keep bumping it up if needed. There is no
> >>>>> limit, or negative impact on db performance.
> >>>>>
> >>>>> Mark
> >>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> HTH,
> >>>>>>>
> >>>>>>> Mark
> >>>>>>>
> >>>>>> [1]
> >>>>>>
> https://directory.fedoraproject.org/docs/389ds/FAQ/changelog-trimming.html
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> --
> >>>>> Directory Server Development Team
> >>>>>
> >>>> _______________________________________________
> >>>> 389-users mailing list -- [email protected]
> >>>> To unsubscribe send an email to
> [email protected]
> >>>> Fedora Code of Conduct:
> >>>> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> >>>> List Guidelines:
> https://fedoraproject.org/wiki/Mailing_list_guidelines
> >>>> List Archives:
> >>>>
> https://lists.fedoraproject.org/archives/list/[email protected]
> >>>>
> >>>> Do not reply to spam on the list, report it:
> >>>> https://pagure.io/fedora-infrastructure
> >>>
> >>> --
> >>> Directory Server Development Team
> >>>
> >>
> > --
> > Directory Server Development Team
> >
> _______________________________________________
> 389-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> Fedora Code of Conduct:
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives:
> https://lists.fedoraproject.org/archives/list/[email protected]
> Do not reply to spam on the list, report it:
> https://pagure.io/fedora-infrastructure
>


-- 
--

389 Directory Server Development Team
_______________________________________________
389-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/[email protected]
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure

Reply via email to