I've seen similar posts, but in the interest of asking fresh and trying to understand what is going on, I thought I would ask for advice on how best to handle this situation.
In the interest of providing some history: I have three (3) FreeIPA servers. Everything is running 4.4.0 now. The originals (orldc-prod-ipa01, orldc-prod-ipa02) were upgraded from the 3.x branch quite a while back. Everything had been working fine, however I ran into a replication issue (that I _think_ may have been a result of IPv6 being disabled by my default Ansible roles). I thought I had resolved that by reinitializing the 2nd replica, orldc-prod-ipa02. In any case, I feel like the replication has never been fully stable since then, and I have all types of errors in messages that indicate something is off. I had single introduced a 3rd replica such that the agreements would look like so: orldc-prod-ipa01 -> orldc-prod-ipa02 ---> bohdc-prod-ipa01 It feels like orldc-prod-ipa02 & bohdc-prod-ipa01 are out of sync. I've tried reinitializing them in order but with no positive results. At this point, I feel like I'm ready to 'bite the bullet' and tear them down quickly (remove them from IPA, delete the local DBs/directories) and rebuild them from scratch. I want to minimize my impact as much as possible (which I can somewhat do by redirecting LDAP/DNS request via my load-balancers temporarily) and do this right. (Getting to the point...) I'd like advice on the order of operations to do this. Give the errors (I'll include samples at the bottom of this message), does it make sense for me to remove the replicas on bohdc-prod-ipa01 & orldc-prod-ipa02 (in that order), wipe out any directories/residual pieces (I'd need some idea of what to do there), and then create new replicas? -OR- Should I export/backup the LDAP DB and rebuild everything from scratch. I need advice and ideas. Furthermore, if there is someone with experience in this that would be interested in making a little money on the side, let me know, because having an extra brain and set of hands would be welcome. DETAILS: ================= ERRORS I see on orldc-prod-ipa01 (the one whose LDAP DB seems the most up-to-date since my changes are usually directed at it): ------ Mar 6 14:36:24 orldc-prod-ipa01 ns-slapd: [06/Mar/2017:14:36:24.434956575 -0500] NSMMReplicationPlugin - agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat" (orldc-prod-ipa02:389): The remote replica has a different database generation ID than the local database. You may have to reinitialize the remote replica, or the local replica. Mar 6 14:36:25 orldc-prod-ipa01 ipa-dnskeysyncd: ipa : INFO LDAP bind... Mar 6 14:36:25 orldc-prod-ipa01 ipa-dnskeysyncd: ipa : INFO Commencing sync process Mar 6 14:36:26 orldc-prod-ipa01 ipa-dnskeysyncd: ipa.ipapython.dnssec.keysyncer.KeySyncer: INFO Initial LDAP dump is done, sychronizing with ODS and BIND Mar 6 14:36:27 orldc-prod-ipa01 ns-slapd: [06/Mar/2017:14:36:27.799519203 -0500] NSMMReplicationPlugin - agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat" (orldc-prod-ipa02:389): The remote replica has a different database generation ID than the local database. You may have to reinitialize the remote replica, or the local replica. Mar 6 14:36:30 orldc-prod-ipa01 ns-slapd: [06/Mar/2017:14:36:30.994760069 -0500] NSMMReplicationPlugin - agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat" (orldc-prod-ipa02:389): The remote replica has a different database generation ID than the local database. You may have to reinitialize the remote replica, or the local replica. Mar 6 14:36:34 orldc-prod-ipa01 ns-slapd: [06/Mar/2017:14:36:34.940115481 -0500] NSMMReplicationPlugin - agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat" (orldc-prod-ipa02:389): The remote replica has a different database generation ID than the local database. You may have to reinitialize the remote replica, or the local replica. Mar 6 14:36:35 orldc-prod-ipa01 named-pkcs11[32134]: client 10.26.250.66#49635 (56.10.in-addr.arpa): transfer of '56.10.in-addr.arpa/IN': AXFR-style IXFR started Mar 6 14:36:35 orldc-prod-ipa01 named-pkcs11[32134]: client 10.26.250.66#49635 (56.10.in-addr.arpa): transfer of '56.10.in-addr.arpa/IN': AXFR-style IXFR ended Mar 6 14:36:37 orldc-prod-ipa01 ns-slapd: [06/Mar/2017:14:36:37.977875463 -0500] NSMMReplicationPlugin - agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat" (orldc-prod-ipa02:389): The remote replica has a different database generation ID than the local database. You may have to reinitialize the remote replica, or the local replica. Mar 6 14:36:40 orldc-prod-ipa01 ns-slapd: [06/Mar/2017:14:36:40.999275184 -0500] NSMMReplicationPlugin - agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat" (orldc-prod-ipa02:389): The remote replica has a different database generation ID than the local database. You may have to reinitialize the remote replica, or the local replica. Mar 6 14:36:45 orldc-prod-ipa01 ns-slapd: [06/Mar/2017:14:36:45.211260414 -0500] NSMMReplicationPlugin - agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat" (orldc-prod-ipa02:389): The remote replica has a different database generation ID than the local database. You may have to reinitialize the remote replica, or the local replica. ------ Errors on orldc-prod-ipa02: ------ r 6 14:16:04 orldc-prod-ipa02 ipa-dnskeysyncd: ipa : INFO Commencing sync process Mar 6 14:16:04 orldc-prod-ipa02 ipa-dnskeysyncd: ipa.ipapython.dnssec.keysyncer.KeySyncer: INFO Initial LDAP dump is done, sychronizing with ODS and BIND Mar 6 14:16:05 orldc-prod-ipa02 ns-slapd: [06/Mar/2017:14:16:05.934405274 -0500] attrlist_replace - attr_replace (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca) failed. Mar 6 14:16:05 orldc-prod-ipa02 ns-slapd: [06/Mar/2017:14:16:05.937278142 -0500] attrlist_replace - attr_replace (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca) failed. Mar 6 14:16:05 orldc-prod-ipa02 ns-slapd: [06/Mar/2017:14:16:05.939434025 -0500] attrlist_replace - attr_replace (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca) failed. Mar 6 14:16:06 orldc-prod-ipa02 ns-slapd: [06/Mar/2017:14:16:06.882795654 -0500] agmt="cn=meTobohdc-prod-ipa01.passur.local" (bohdc-prod-ipa01:389) - Can't locate CSN 58bdf8f5000200070000 in the changelog (DB rc=-30988). If replication stops, the consumer may need to be reinitialized. Mar 6 14:16:06 orldc-prod-ipa02 ns-slapd: [06/Mar/2017:14:16:06.886029272 -0500] NSMMReplicationPlugin - changelog program - agmt="cn=meTobohdc-prod-ipa01.passur.local" (bohdc-prod-ipa01:389): CSN 58bdf8f5000200070000 not found, we aren't as up to date, or we purged Mar 6 14:16:06 orldc-prod-ipa02 ns-slapd: [06/Mar/2017:14:16:06.888679268 -0500] NSMMReplicationPlugin - agmt="cn=meTobohdc-prod-ipa01.passur.local" (bohdc-prod-ipa01:389): Data required to update replica has been purged from the changelog. The replica must be reinitialized. Mar 6 14:16:06 orldc-prod-ipa02 ns-slapd: [06/Mar/2017:14:16:06.960804253 -0500] NSMMReplicationPlugin - agmt="cn=masterAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat" (orldc-prod-ipa01:389): The remote replica has a different database generation ID than the local database. You may have to reinitialize the remote replica, or the local replica. Mar 6 14:16:08 orldc-prod-ipa02 ns-slapd: [06/Mar/2017:14:16:08.960622608 -0500] attrlist_replace - attr_replace (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca) failed. Mar 6 14:16:08 orldc-prod-ipa02 ns-slapd: [06/Mar/2017:14:16:08.968927168 -0500] attrlist_replace - attr_replace (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca) failed. Mar 6 14:16:08 orldc-prod-ipa02 ns-slapd: [06/Mar/2017:14:16:08.976952118 -0500] attrlist_replace - attr_replace (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca) failed. Mar 6 14:16:09 orldc-prod-ipa02 ns-slapd: [06/Mar/2017:14:16:09.972315877 -0500] NSMMReplicationPlugin - agmt="cn=masterAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat" (orldc-prod-ipa01:389): The remote replica has a different database generation ID than the local database. You may have to reinitialize the remote replica, or the local replica. Mar 6 14:16:10 orldc-prod-ipa02 ns-slapd: [06/Mar/2017:14:16:10.034810948 -0500] agmt="cn=meTobohdc-prod-ipa01.passur.local" (bohdc-prod-ipa01:389) - Can't locate CSN 58bdf8f5000200070000 in the changelog (DB rc=-30988). If replication stops, the consumer may need to be reinitialized. Mar 6 14:16:10 orldc-prod-ipa02 ns-slapd: [06/Mar/2017:14:16:10.040020359 -0500] NSMMReplicationPlugin - changelog program - agmt="cn=meTobohdc-prod-ipa01.passur.local" (bohdc-prod-ipa01:389): CSN 58bdf8f5000200070000 not found, we aren't as up to date, or we purged Mar 6 14:16:10 orldc-prod-ipa02 ns-slapd: [06/Mar/2017:14:16:10.042846879 -0500] NSMMReplicationPlugin - agmt="cn=meTobohdc-prod-ipa01.passur.local" (bohdc-prod-ipa01:389): Data required to update replica has been purged from the changelog. The replica must be reinitialized. Mar 6 14:16:13 orldc-prod-ipa02 ns-slapd: [06/Mar/2017:14:16:13.013253769 -0500] attrlist_replace - attr_replace (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca) failed. Mar 6 14:16:13 orldc-prod-ipa02 ns-slapd: [06/Mar/2017:14:16:13.021514225 -0500] attrlist_replace - attr_replace (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca) failed. Mar 6 14:16:13 orldc-prod-ipa02 ns-slapd: [06/Mar/2017:14:16:13.027521508 -0500] attrlist_replace - attr_replace (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca) failed. Mar 6 14:16:13 orldc-prod-ipa02 ns-slapd: [06/Mar/2017:14:16:13.110566247 -0500] NSMMReplicationPlugin - agmt="cn=masterAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat" (orldc-prod-ipa01:389): The remote replica has a different database generation ID than the local database. You may have to reinitialize the remote replica, or the local replica. Mar 6 14:16:14 orldc-prod-ipa02 ns-slapd: [06/Mar/2017:14:16:14.179819300 -0500] agmt="cn=meTobohdc-prod-ipa01.passur.local" (bohdc-prod-ipa01:389) - Can't locate CSN 58bdf8f5000200070000 in the changelog (DB rc=-30988). If replication stops, the consumer may need to be reinitialized. Mar 6 14:16:14 orldc-prod-ipa02 ns-slapd: [06/Mar/2017:14:16:14.188353328 -0500] NSMMReplicationPlugin - changelog program - agmt="cn=meTobohdc-prod-ipa01.passur.local" (bohdc-prod-ipa01:389): CSN 58bdf8f5000200070000 not found, we aren't as up to date, or we purged Mar 6 14:16:14 orldc-prod-ipa02 ns-slapd: [06/Mar/2017:14:16:14.196463928 -0500] NSMMReplicationPlugin - agmt="cn=meTobohdc-prod-ipa01.passur.local" (bohdc-prod-ipa01:389): Data required to update replica has been purged from the changelog. The replica must be reinitialized. Mar 6 14:16:17 orldc-prod-ipa02 ns-slapd: [06/Mar/2017:14:16:17.068292919 -0500] attrlist_replace - attr_replace (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca) failed. Mar 6 14:16:17 orldc-prod-ipa02 ns-slapd: [06/Mar/2017:14:16:17.071241757 -0500] attrlist_replace - attr_replace (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca) failed. Mar 6 14:16:17 orldc-prod-ipa02 ns-slapd: [06/Mar/2017:14:16:17.073793922 -0500] attrlist_replace - attr_replace (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca) failed. ------ Thanks in advance!!! -- Chris -- Manage your subscription for the Freeipa-users mailing list: https://www.redhat.com/mailman/listinfo/freeipa-users Go to http://freeipa.org for more info on the project