Hi, I suspect that Kees discovered a new bug ... As the retro changelog is persistent across reboot, it is normal that their timestamp are absolute, so IMHO there is no way relative time should be used/seen in trimming computation. (in other words cur_time is wrong. ) My suspicion is that we got a regression when the slapi_eq timers where rewritten to use relative time instead of absolute time) (cur_time is provided by slapi_eq API and is now relative ( it should rather be initialized to time(0) ) The funny thing is that nobody have seen that retrocl trimming was broken for so long ...
Regards Pierre, On Wed, Aug 11, 2021 at 11:50 AM Kees Bakker <[email protected]> wrote: > On 09-08-2021 20:23, Mark Reynolds wrote: > > On 8/9/21 1:47 PM, Kees Bakker wrote: > >> On 09-08-2021 19:25, Mark Reynolds wrote: > >>> On 8/9/21 1:16 PM, Kees Bakker wrote: > >>>> On 09-08-2021 18:43, Mark Reynolds wrote: > >>>>> On 8/9/21 11:20 AM, Kees Bakker wrote: > >>>>>> On 09-08-2021 16:00, Mark Reynolds wrote: > >>>>>>> On 8/9/21 8:09 AM, Kees Bakker wrote: > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> When my dirsrv was trying to compact the databases I was getting > >>>>>>>> this > >>>>>>>> error > >>>>>>>> > >>>>>>>> [07/Aug/2021:23:59:02.715984489 +0200] - NOTICE - bdb_compact - > >>>>>>>> Compacting databases ... > >>>>>>>> [07/Aug/2021:23:59:02.765932397 +0200] - NOTICE - bdb_compact - > >>>>>>>> Compacting DB start: userRoot > >>>>>>>> [07/Aug/2021:23:59:03.518175414 +0200] - NOTICE - bdb_compact - > >>>>>>>> compactdb: compact userRoot - 417 pages freed > >>>>>>>> [07/Aug/2021:23:59:03.576427786 +0200] - NOTICE - bdb_compact - > >>>>>>>> Compacting DB start: ipaca > >>>>>>>> [07/Aug/2021:23:59:03.659941533 +0200] - NOTICE - bdb_compact - > >>>>>>>> compactdb: compact ipaca - 419 pages freed > >>>>>>>> [07/Aug/2021:23:59:03.718445310 +0200] - NOTICE - bdb_compact - > >>>>>>>> Compacting DB start: changelog > >>>>>>>> [08/Aug/2021:00:00:40.807571334 +0200] - NOTICE - > >>>>>>>> NSMMReplicationPlugin - changelog program - cl5CompactDBs - > >>>>>>>> compacting > >>>>>>>> replication changelogs... > >>>>>>>> [08/Aug/2021:00:00:54.309357211 +0200] - ERR - libdb - BDB2055 > Lock > >>>>>>>> table is out of available lock entries > >>>>>>>> [08/Aug/2021:00:00:54.726504736 +0200] - ERR - bdb_compact - > >>>>>>>> compactdb: failed to compact changelog; db error - 12 Cannot > >>>>>>>> allocate > >>>>>>>> memory > >>>>>>>> [08/Aug/2021:00:00:54.801571421 +0200] - ERR - libdb - BDB2055 > Lock > >>>>>>>> table is out of available lock entries > >>>>>>>> [08/Aug/2021:00:00:54.876618702 +0200] - ERR - > >>>>>>>> NSMMReplicationPlugin - > >>>>>>>> changelog program - cl5CompactDBs - Failed to compact > >>>>>>>> a797bb0b-be1d11eb-88c0b677-613aa2ad; db error - 12 Cannot allocate > >>>>>>>> memory > >>>>>>>> [08/Aug/2021:00:00:57.253006449 +0200] - NOTICE - bdb_compact - > >>>>>>>> Compacting databases finished. > >>>>>>>> > >>>>>>>> There are about 402k entries in cn=changelog. > >>>>>>>> > >>>>>>>> I have a few questions > >>>>>>>> 1) is it normal to have so many entries in cn=changelog? On > another > >>>>>>>> replica I have almost 3M entries Isn't this cleaned up? > >>>>>>>> 2) the number of locks is 50000 (there are two config items). > >>>>>>>> Should I > >>>>>>>> increase that number? If so, increase to what? > >>>>>>>> 3) is there maybe something else going on, causing the > >>>>>>>> exhaustion of > >>>>>>>> the locks? > >>>>>>> > >>>>>>> Ok, so by default there is no changelog trimming enabled. So the > >>>>>>> changelog will grow without bounds, which is bad. > >>>>>> > >>>>>> How much of this [1] applies? Indeed it says "By default the > >>>>>> Replication Changelog does not use any trimming by default." > >>>>>> So, how is the trimming actually enabled? I have these entries > >>>>>> > >>>>>> dn: cn=changelog5,cn=config > >>>>>> cn: changelog5 > >>>>>> nsslapd-changelogdir: /var/lib/dirsrv/slapd-EXAMPLE-COM/cldb > >>>>>> nsslapd-changelogmaxage: 30d > >>>>>> objectClass: top > >>>>>> objectClass: extensibleobject > >>>>> > >>>>> So trimming is set, but it's set to 30 days (30d), that > >>>>> could/should be > >>>>> shortened to "7d". > >>>> > >>>> Somehow I doubt that this will do anything. > >>> > >>> Ahhh, I thought this was the Replication Changelog, but it is the > >>> "Retro" Changelog. Ok so check this out: > >>> > >>> > https://access.redhat.com/documentation/en-us/red_hat_directory_server/11/html/administration_guide/managing_replication-using_the_retro_changelog_plug_in#Using_the_Retro_Changelog_Plug_in-Trimming_the_Retro_Changelog > >>> > >>> > >>> Or update config directly using ldapmodify: > >>> > >>> # ldapmodify -D ... > >>> dn: cn=Retro Changelog Plugin,cn=plugins,cn=config > >>> changetype: modify > >>> replace: nsslapd-changelogmaxage: 7d > >>> nsslapd-changelogmaxage: 7d > >> > >> (( Typo? The replace line is without the value )) > > yes, sorry... > >> > >> Hmm, that maxage was already at 2 days > > > > Ok, perhaps you are running into a known bug where indexes created by > > plugins are not enabled. Or might be a config issue? > > > > Try this first. Set in the retro changelog entry: > > > > nsslapd-changelog-trim-interval: 300 <--- 5 minutes > (Sorry to be picky, but it is nsslapd-changelogtrim-interval. It's default > is 300.) > > > > Restart the server to make sure it was applied. You can enable "plugin" > > logging to see what it does. > > > > nsslapd-errorlog-level: 65536 > > > > wait over 5 minutes, then grep the errors log for "DSRetroclPlugin". > > This will tell us what the plugin is doing when it goes to trim the > > database. > > > > IMPORTANT, disable plugin logging immediately after you get your data. > > Set the errorlog level to 0 to go back to default logging. > > > > If the logging shows it's not finding any candidates then it might be > > the indexing issue I mentioned above. In that case you just need to > > reindex the retro changelog: > > Before changing anything I enabled Plug-in debugging which revealed > something interesting > > [root@linge ~]# grep trim > /var/log/dirsrv/slapd-GHS-NL/errors-plugin-debug-20210811-1108 > [11/Aug/2021:11:01:49.792450548 +0200] - DEBUG - DSRetroclPlugin - cltrim: > ldrc=0, first_time=1622032141, cur_time=3702600 > [11/Aug/2021:11:01:49.842339467 +0200] - DEBUG - DSRetroclPlugin - > retrocl_housekeeping - changelog does not need to be trimmed > [11/Aug/2021:11:06:49.953401407 +0200] - DEBUG - DSRetroclPlugin - cltrim: > ldrc=0, first_time=1622032141, cur_time=3702900 > [11/Aug/2021:11:06:50.012249703 +0200] - DEBUG - DSRetroclPlugin - > retrocl_housekeeping - changelog does not need to be trimmed > > This shows that the interval is 300 (even if > nsslapd-changelogtrim-interval is not explicitly set in > cn=changelog5,cn=config > > Look at cur_time. That is the "uptime" of the system, not UTC. > > [root@linge ~]# uptime > 11:21:42 up 42 days, 20:49, 2 users, load average: 0.15, 0.14, 0.16 > > Next, assuming this code from retrocl_trim.c is making the decision to trim > > first_time = retrocl_getchangetime(SLAPI_SEQ_FIRST, &ldrc); > slapi_log_err(SLAPI_LOG_PLUGIN, RETROCL_PLUGIN_NAME, > "cltrim: ldrc=%d, first_time=%ld, > cur_time=%ld\n", > ldrc, first_time, cur_time); > if (LDAP_SUCCESS == ldrc && first_time > (time_t)0L && > first_time + ts.ts_c_max_age < cur_time) { > must_trim = 1; > } > > There is a mixup of absolute (first_time) and relative (cur_time) time. > Which of the two is wrong? > > > > > # dsctl slapd-YOUR_INSTANCE stop > > > > # dsctl slapd-YOUR_INSTANCE db2index changelog > > > > # dsctl slapd-YOUR_INSTANCE start > > > > Then wait 5 minutes and see if anything was trimmed. > > > > Mark > > > >> > >> dn: cn=Retro Changelog Plugin,cn=plugins,cn=config > >> cn: Retro Changelog Plugin > >> nsslapd-attribute: nsuniqueid:targetUniqueId > >> nsslapd-changelogmaxage: 2d > >> nsslapd-include-suffix: cn=dns,dc=example,dc=com > >> nsslapd-plugin-depends-on-named: Class of Service > >> nsslapd-plugin-depends-on-type: database > >> nsslapd-pluginDescription: Retrocl Plugin > >> nsslapd-pluginEnabled: on > >> nsslapd-pluginId: retrocl > >> nsslapd-pluginInitfunc: retrocl_plugin_init > >> nsslapd-pluginPath: libretrocl-plugin > >> nsslapd-pluginType: object > >> nsslapd-pluginVendor: 389 Project > >> nsslapd-pluginVersion: 1.4.3.231 > >> nsslapd-pluginbetxn: on > >> nsslapd-pluginprecedence: 25 > >> objectClass: top > >> objectClass: nsSlapdPlugin > >> objectClass: extensibleObject > >> > >>> > >>> > >>> HTH, > >>> > >>> Mark > >>> > >>>> > >>>> The first entry in cn=changelog is more than two months old. (May 26th > >>>> is about > >>>> the time I installed this server as a FreeIPA replica.) > >>>> > >>>> dn: changenumber=1,cn=changelog > >>>> objectClass: top > >>>> objectClass: changelogentry > >>>> objectClass: extensibleObject > >>>> targetuniqueid: 1e89ebcb-e20111e8-8820f96e-fc0c60a4 > >>>> changeNumber: 1 > >>>> targetDn: > >>>> idnsname=_ldap._tcp,idnsname=example.com.,cn=dns,dc=example,dc=com > >>>> changeTime: 20210526122901Z > >>>> changeType: modify > >>>> changes:: YWRkOiBzUlZSZWNvc...gplbnRyeXVzbjogMTEyCi0KAA== > >>>> > >>>> Is there a way to see trimming in action? I'm afraid it is not doing > >>>> anything. > >> > >> What diagnostic can I enable to see the trimming in action? > >> > >>>> > >>>>> > >>>>>> > >>>>>> > >>>>>>> > >>>>>>> I recommend setting up the changelog max age to 7 days: > >>>>>>> > >>>>>>> > https://access.redhat.com/documentation/en-us/red_hat_directory_server/11/html/administration_guide/trimming_the_replication_changelog > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> Once that the trimming cleans up the changelog the database > >>>>>>> compaction > >>>>>>> should work without issue. You can also increase the database > >>>>>>> locks to > >>>>>>> 1 million (that does require a restart of the server to take > >>>>>>> effect). > >>>>>> > >>>>>> Let's hope that 1 million is enough :-) > >>>>> > >>>>> It might not be, you can keep bumping it up if needed. There is no > >>>>> limit, or negative impact on db performance. > >>>>> > >>>>> Mark > >>>>> > >>>>>> > >>>>>>> > >>>>>>> HTH, > >>>>>>> > >>>>>>> Mark > >>>>>>> > >>>>>> [1] > >>>>>> > https://directory.fedoraproject.org/docs/389ds/FAQ/changelog-trimming.html > >>>>>> > >>>>>> > >>>>> > >>>>> -- > >>>>> Directory Server Development Team > >>>>> > >>>> _______________________________________________ > >>>> 389-users mailing list -- [email protected] > >>>> To unsubscribe send an email to > [email protected] > >>>> Fedora Code of Conduct: > >>>> https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > >>>> List Guidelines: > https://fedoraproject.org/wiki/Mailing_list_guidelines > >>>> List Archives: > >>>> > https://lists.fedoraproject.org/archives/list/[email protected] > >>>> > >>>> Do not reply to spam on the list, report it: > >>>> https://pagure.io/fedora-infrastructure > >>> > >>> -- > >>> Directory Server Development Team > >>> > >> > > -- > > Directory Server Development Team > > > _______________________________________________ > 389-users mailing list -- [email protected] > To unsubscribe send an email to [email protected] > Fedora Code of Conduct: > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: > https://lists.fedoraproject.org/archives/list/[email protected] > Do not reply to spam on the list, report it: > https://pagure.io/fedora-infrastructure > -- -- 389 Directory Server Development Team
_______________________________________________ 389-users mailing list -- [email protected] To unsubscribe send an email to [email protected] Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/[email protected] Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
