Re: [Freeipa-users] Replication has stopped and server errors

Martin Basti Fri, 06 Jan 2017 09:03:45 -0800


On 06.01.2017 00:29, sipazzo wrote:

I have6 ipa servers in 3 locations running 4.2.0-15.0.1on RHEL 7.Ipa1-dev is the CA Renewal and CRL Master server and where most of ourupdates (host enrollment, password changes) end up taking place.Servers had been running fine. Over the holidays we started havingsome replication issues and looking at/var/log/dirsrv/slapd-REALM-COM/errors showed the following:
All servers currently have these errors for each replica therespective IPA servers are connected to:NSMMReplicationPlugin - agmt="cn=meToipa2-dr.example.local"(ipa2-dr:389): Incremental update failed and requires administrator action[04/Jan/2017:15:39:48 -0800] agmt="cn=meToipa1-dr.example.local"(ipa1-dr:389) - Can't locate CSN 583c8e74000600110000 in the changelog(DB rc=-30988). If replication stops, the consumer may need to bereinitializedNSMMReplicationPlugin - agmt="cn=meToipa1-prod.example.local"(ipa1-prod:389): Data required to update replica has been purged. Thereplica must be reinitialized.[04/Jan/2017:13:33:26 -0800] NSMMReplicationPlugin -agmt="cn=meToipa2-dev.example.local" (ipa2-dev:389): Incrementalupdate failed and requires administrator action[04/Jan/2017:13:33:26 -0800] NSMMReplicationPlugin -agmt="cn=meToipa1-prod.example.local" (ipa1-prod:389): Incrementalupdate failed and requires administrator action[04/Jan/2017:13:33:27 -0800] agmt="cn=meToipa2-prod.example.local"(ipa2-prod:389) - Can't locate CSN 586d69f0000400120000 in thechangelog (DB rc=-30988). If replication stops, the consumer may needto be reinitialized.And all servers have these types of errors which are worrisome butthey go back quite a way
*NSACL*Plugin - The ACL target cn=dns,dc=example,dc=local does not exist
*NSACL*Plugin - The ACL target cn=dns,dc=example,dc=local does not exist
*NSACL*Plugin - The ACL target cn=groups,cn=compat,dc=example,dc=localdoes not exist*NSACL*Plugin - The ACL targetcn=computers,cn=compat,dc=example,dc=local does not exist*NSACL*Plugin - The ACL target cn=casigningcertcert-pki-ca,cn=ca_renewal,cn=ipa,cn=etc,dc=example,dc=local does not exist*NSACL*Plugin - The ACL target cn=casigningcertcert-pki-ca,cn=ca_renewal,cn=ipa,cn=etc,dc=example,dc=local does not exist*NSACL*Plugin - The ACL target ou=sudoers,dc=networkfleet,dc=localdoes not exist

^^^ just INFO messages, you can ignore them

All servers except one have a lot of these
DSRetroclPlugin - delete_changerecord: could not delete change record
Ipa1-dev only has this
04/Jan/2017:18:36:52 -0800] NSMMReplicationPlugin -agmt="cn=masterAgreement1-ipa1-prod.example.local-pki-tomcat"(ipa1-prod:389): Replication bind with *SIMPLE* auth resumed[04/Jan/2017:18:36:52 -0800] NSMMReplicationPlugin -agmt="cn=masterAgreement1-ipa2-dr.example.local-pki-tomcat"(ipa2-dr:389): Replication bind with *SIMPLE* auth resumed[04/Jan/2017:18:36:52 -0800] NSMMReplicationPlugin -agmt="cn=masterAgreement1-ipa1-dr.example.local-pki-tomcat"(ipa1-dr:389): Replication bind with *SIMPLE* auth resumed[04/Jan/2017:18:36:53 -0800] NSMMReplicationPlugin -agmt="cn=masterAgreement1-ipa2-prod.example.local-pki-tomcat"(ipa2-prod:389): Replication bind with *SIMPLE* auth resumed
3 servers (ipa1-dr ipa2-dr ipa2-prod) have these errors:
[01/Jan/2017:14:43:06 -0800] - libdb: BDB2055 Lock table is out ofavailable lock entries[01/Jan/2017:14:43:06 -0800] - compactdb: failed to compact changelog;db error - 12 Cannot allocate memory

you probably need https://access.redhat.com/solutions/1241063 toincrease number of locks (or in this threadhttps://lists.fedoraproject.org/pipermail/389-users/2011-June/013299.html)

I would first increase the number of locks, and then look if somethingimproved.We also don't know how your topology looks like, which servers areconnected together.


Martin

4 servers (ipa1-dev, ipa2-dev, ipa1-dr and ipa2-dr) have these errors
[04/Jan/2017:15:37:21 -0800] slapd_ldap_sasl_interactive_bind - Error:could not perform interactive bind for id [] mech [GSSAPI]: LDAP error-1 (Can't contact LDAP server) ((null)) errno 107 (*Transport*endpoint is not connected)[04/Jan/2017:15:37:24 -0800] slapd_ldap_sasl_interactive_bind - Error:could not perform interactive bind for id [] mech [GSSAPI]: LDAP error-1 (Can't contact LDAP server) ((null)) errno 107 (*Transport*endpoint is not connected)
I have tried various combinations or restarting, re-initializing,disconnecting and reconnecting replicas but am down to only twoservers replicating with each other currently (ipa1-dev and ipa2-dev).We did have a power outage at the dev location but it does not seem tocorrespond to when the errors started? Not sure how to recover fromthis. Any help is appreciated

-- 
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project

Re: [Freeipa-users] Replication has stopped and server errors

Reply via email to