Thanks Ludwig for the suggestion and thanks to Maciej for the confirmation from his end. This issue is happening for us for several weeks, so I don’t think this is a transient problem.
What is the best way to sanitize the logs without removing useful info before sending them your way? Will the files mentioned on "https://www.freeipa.org/page/Files_to_be_attached_to_bug_report -> Directory server failed" be sufficient? I’ve also run the ipa_consistency_check script, and the output shows that something is indeed wrong with the sync: “”” FreeIPA servers: inf01 inf01 inf02 inf02 STATE ============================================================= Active Users 15 15 15 15 OK Stage Users 0 0 0 0 OK Preserved Users 3 3 3 3 OK User Groups 9 9 9 9 OK Hosts 45 45 45 46 FAIL Host Groups 7 7 7 7 OK HBAC Rules 6 6 6 6 OK SUDO Rules 7 7 7 7 OK DNS Zones 33 33 33 33 OK LDAP Conflicts NO NO NO NO OK Ghost Replicas 2 2 2 2 FAIL Anonymous BIND YES YES YES YES OK Replication Status inf01.prod 0inf01.dev 0inf01.dev 0inf01.dev 0 inf02.dev 0inf02.dev 0inf01.prod 0inf01.prod 0 inf02.prod 0inf02.prod 0inf02.prod 0inf02.dev 0 ============================================================= “”” Thanks, Goran > On May 15, 2017, at 6:35 AM, Ludwig Krispenz <lkris...@redhat.com> wrote: > > The messages you see could be transient messages, and if replication is > working than this seems to be the case. If not we would need more data to > investigate: deployment info, relicaIDs of all servers, ruvs, logs,..... > > Here is some background info: there are some scenarios where a csn could not > be found in the changelog, eg if updates were aplied on the supplier during a > total init, they could be part of the data and database ruv, but not in the > changelog of the initialized replica. > ds did try to use an alternative csn in cases where it could not be found, > but this had the risk of missing updates, so we decided to change it and make > this misssing csn a non fatal error, backoff and retry, if another supplier > would have updated the replica in between, the starting csn could have > changed and be found. so if the reported missing csns change and replication > continues everything is ok, although I think the messages should stop at some > point. > > There is a configuration parameter for a replciation agreement to trigger the > previous behaviour of picking an alternative csn: > nsds5ReplicaIgnoreMissingChange > with potential values "once", "always". > > where "once" just tries to kickstart replication by using another csn and > "always" changes the default behaviour > > > On 05/11/2017 06:53 PM, Goran Marik wrote: >> Hi, >> >> After an upgrade to Centos 7.3.1611 with “yum update", we started seeing the >> following messages in the logs: >> “”” >> May 9 21:58:28 inf01 ns-slapd[4323]: [09/May/2017:21:58:28.519724479 +0000] >> NSMMReplicationPlugin - changelog program - >> agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" (inf02:389): CSN >> 576b34e8000a050f0000 not found, we aren't as up to date, or we purged >> May 9 21:58:28 inf01 ns-slapd[4323]: [09/May/2017:21:58:28.550459233 +0000] >> NSMMReplicationPlugin - >> agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" (inf02:389): Data >> required to update replica has been purged from the changelog. The replica >> must be reinitialized. >> May 9 21:58:32 inf01 ns-slapd[4323]: [09/May/2017:21:58:32.588245476 +0000] >> agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" (inf02:389) - >> Can't locate CSN 576b34e8000a050f0000 in the changelog (DB rc=-30988). If >> replication stops, the consumer may need to be reinitialized. >> May 9 21:58:32 inf01 ns-slapd[4323]: [09/May/2017:21:58:32.611400689 +0000] >> NSMMReplicationPlugin - changelog program - >> agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" (inf02:389): CSN >> 576b34e8000a050f0000 not found, we aren't as up to date, or we purged >> May 9 21:58:32 inf01 ns-slapd[4323]: [09/May/2017:21:58:32.642226385 +0000] >> NSMMReplicationPlugin - >> agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" (inf02:389): Data >> required to update replica has been purged from the changelog. The replica >> must be reinitialized. >> “”” >> >> The log messages are pretty frequently, every few seconds, and report few >> different CSN numbers that cannot be located. >> >> This happens only on one replica out of 4. We’ve tried "ipa-replica-manage >> re-initialize —from” and “ipa-csreplica-manage re-initialize —from” several >> times, but while both commands report success, the log messages continue to >> happen. The server was rebooted and “systemctl restart ipa” was done few >> times as well. >> >> The replica seems to be working fine despite the errors, but I’m worried >> that the logs indicate underlaying problem we are not fully detecting. I >> would like to understand better what is triggering this behaviour and how to >> fix it, and if someone else saw them after a recent upgrades. >> >> The software versions are 389-ds-base-1.3.5.10-20.el7_3.x86_64 and >> ipa-server-4.4.0-14.el7.centos.7.x86_64 >> >> Thanks, >> Goran >> >> -- >> Goran Marik >> Senior Systems Developer >> >> ecobee >> 250 University Ave, Suite 400 >> Toronto, ON M5H 3E5 >> >> >> >> > > -- > Red Hat GmbH, > http://www.de.redhat.com/ > , Registered seat: Grasbrunn, > Commercial register: Amtsgericht Muenchen, HRB 153243, > Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill, > Eric Shander > > -- > Manage your subscription for the Freeipa-users mailing list: > https://www.redhat.com/mailman/listinfo/freeipa-users > Go to http://freeipa.org for more info on the project -- Goran Marik Senior Systems Developer ecobee 250 University Ave, Suite 400 Toronto, ON M5H 3E5 -- Manage your subscription for the Freeipa-users mailing list: https://www.redhat.com/mailman/listinfo/freeipa-users Go to http://freeipa.org for more info on the project