Thank you very much for the response!

To start:
----
[root@orldc-prod-ipa01 ~]# rpm -qa 389-ds-base
389-ds-base-1.3.5.10-18.el7_3.x86_64
----

So, I believe a good part of my problem is that I'm not _positive_
which replica is good at this point (though my directory really isn't
that huge).

Do you have any pointers on a good method of comparing the directory
data between them?  I was wondering if anyone knows of any tools to
facilitate that.  I was thinking that it might make sense for me to
dump the DB and restore, but I really don't know that procedure.  As I
mentioned, my directory really isn't that large at all, however I'm
not positive the best bullet-item listed method to proceed.  (I know
I'm not helping things :) )

Would it be acceptable to just 'assume' one of the replicas is good
(taking the risk of whatever missing pieces I'll have to deal with),
completely removing the others, and then rebuilding the replicas from
scratch?

If I go that route, what are the potential pitfalls?


I want to decide on an approach and try and resolve this once and for all.

Thanks again! It really is appreciated as I've been frustrated with
this for a while now.

-- Chris

On Tue, Mar 7, 2017 at 8:45 AM, Mark Reynolds <marey...@redhat.com> wrote:
> What version of 389-ds-base are you using?
>
> rpm -qa | grep 389-ds-base
>
>
> comments below..
>
> On 03/06/2017 02:37 PM, Christopher Young wrote:
>
> I've seen similar posts, but in the interest of asking fresh and
> trying to understand what is going on, I thought I would ask for
> advice on how best to handle this situation.
>
> In the interest of providing some history:
> I have three (3) FreeIPA servers.  Everything is running 4.4.0 now.
> The originals (orldc-prod-ipa01, orldc-prod-ipa02) were upgraded from
> the 3.x branch quite a while back.  Everything had been working fine,
> however I ran into a replication issue (that I _think_ may have been a
> result of IPv6 being disabled by my default Ansible roles).  I thought
> I had resolved that by reinitializing the 2nd replica,
> orldc-prod-ipa02.
>
> In any case, I feel like the replication has never been fully stable
> since then, and I have all types of errors in messages that indicate
> something is off.  I had single introduced a 3rd replica such that the
> agreements would look like so:
>
> orldc-prod-ipa01 -> orldc-prod-ipa02 ---> bohdc-prod-ipa01
>
> It feels like orldc-prod-ipa02 & bohdc-prod-ipa01 are out of sync.
> I've tried reinitializing them in order but with no positive results.
> At this point, I feel like I'm ready to 'bite the bullet' and tear
> them down quickly (remove them from IPA, delete the local
> DBs/directories) and rebuild them from scratch.
>
> I want to minimize my impact as much as possible (which I can somewhat
> do by redirecting LDAP/DNS request via my load-balancers temporarily)
> and do this right.
>
> (Getting to the point...)
>
> I'd like advice on the order of operations to do this.  Give the
> errors (I'll include samples at the bottom of this message), does it
> make sense for me to remove the replicas on bohdc-prod-ipa01 &
> orldc-prod-ipa02 (in that order), wipe out any directories/residual
> pieces (I'd need some idea of what to do there), and then create new
> replicas? -OR-  Should I export/backup the LDAP DB and rebuild
> everything from scratch.
>
> I need advice and ideas.  Furthermore, if there is someone with
> experience in this that would be interested in making a little money
> on the side, let me know, because having an extra brain and set of
> hands would be welcome.
>
> DETAILS:
> =================
>
>
> ERRORS I see on orldc-prod-ipa01 (the one whose LDAP DB seems the most
> up-to-date since my changes are usually directed at it):
> ------
> Mar  6 14:36:24 orldc-prod-ipa01 ns-slapd:
> [06/Mar/2017:14:36:24.434956575 -0500] NSMMReplicationPlugin -
> agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat"
> (orldc-prod-ipa02:389): The remote replica has a different database
> generation ID than the local database.  You may have to reinitialize
> the remote replica, or the local replica.
> Mar  6 14:36:25 orldc-prod-ipa01 ipa-dnskeysyncd: ipa         : INFO
>   LDAP bind...
> Mar  6 14:36:25 orldc-prod-ipa01 ipa-dnskeysyncd: ipa         : INFO
>   Commencing sync process
> Mar  6 14:36:26 orldc-prod-ipa01 ipa-dnskeysyncd:
> ipa.ipapython.dnssec.keysyncer.KeySyncer: INFO     Initial LDAP dump
> is done, sychronizing with ODS and BIND
> Mar  6 14:36:27 orldc-prod-ipa01 ns-slapd:
> [06/Mar/2017:14:36:27.799519203 -0500] NSMMReplicationPlugin -
> agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat"
> (orldc-prod-ipa02:389): The remote replica has a different database
> generation ID than the local database.  You may have to reinitialize
> the remote replica, or the local replica.
> Mar  6 14:36:30 orldc-prod-ipa01 ns-slapd:
> [06/Mar/2017:14:36:30.994760069 -0500] NSMMReplicationPlugin -
> agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat"
> (orldc-prod-ipa02:389): The remote replica has a different database
> generation ID than the local database.  You may have to reinitialize
> the remote replica, or the local replica.
> Mar  6 14:36:34 orldc-prod-ipa01 ns-slapd:
> [06/Mar/2017:14:36:34.940115481 -0500] NSMMReplicationPlugin -
> agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat"
> (orldc-prod-ipa02:389): The remote replica has a different database
> generation ID than the local database.  You may have to reinitialize
> the remote replica, or the local replica.
> Mar  6 14:36:35 orldc-prod-ipa01 named-pkcs11[32134]: client
> 10.26.250.66#49635 (56.10.in-addr.arpa): transfer of
> '56.10.in-addr.arpa/IN': AXFR-style IXFR started
> Mar  6 14:36:35 orldc-prod-ipa01 named-pkcs11[32134]: client
> 10.26.250.66#49635 (56.10.in-addr.arpa): transfer of
> '56.10.in-addr.arpa/IN': AXFR-style IXFR ended
> Mar  6 14:36:37 orldc-prod-ipa01 ns-slapd:
> [06/Mar/2017:14:36:37.977875463 -0500] NSMMReplicationPlugin -
> agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat"
> (orldc-prod-ipa02:389): The remote replica has a different database
> generation ID than the local database.  You may have to reinitialize
> the remote replica, or the local replica.
> Mar  6 14:36:40 orldc-prod-ipa01 ns-slapd:
> [06/Mar/2017:14:36:40.999275184 -0500] NSMMReplicationPlugin -
> agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat"
> (orldc-prod-ipa02:389): The remote replica has a different database
> generation ID than the local database.  You may have to reinitialize
> the remote replica, or the local replica.
> Mar  6 14:36:45 orldc-prod-ipa01 ns-slapd:
> [06/Mar/2017:14:36:45.211260414 -0500] NSMMReplicationPlugin -
> agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat"
> (orldc-prod-ipa02:389): The remote replica has a different database
> generation ID than the local database.  You may have to reinitialize
> the remote replica, or the local replica.
> ------
>
> These messages indicate that the replica does not have the same database as
> the master.  So either the master or the replica needs to be reinitialized.,
> More on this below...
>
>
> Errors on orldc-prod-ipa02:
> ------
> r  6 14:16:04 orldc-prod-ipa02 ipa-dnskeysyncd: ipa         : INFO
> Commencing sync process
> Mar  6 14:16:04 orldc-prod-ipa02 ipa-dnskeysyncd:
> ipa.ipapython.dnssec.keysyncer.KeySyncer: INFO     Initial LDAP dump
> is done, sychronizing with ODS and BIND
> Mar  6 14:16:05 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:05.934405274 -0500] attrlist_replace - attr_replace
> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
> failed.
> Mar  6 14:16:05 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:05.937278142 -0500] attrlist_replace - attr_replace
> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
> failed.
> Mar  6 14:16:05 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:05.939434025 -0500] attrlist_replace - attr_replace
> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
> failed.
>
> These are harmless "errors" which have been removed in newer versions of
> 389-ds-base.
>
> Mar  6 14:16:06 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:06.882795654 -0500]
> agmt="cn=meTobohdc-prod-ipa01.passur.local" (bohdc-prod-ipa01:389) -
> Can't locate CSN 58bdf8f5000200070000 in the changelog (DB rc=-30988).
> If replication stops, the consumer may need to be reinitialized.
> Mar  6 14:16:06 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:06.886029272 -0500] NSMMReplicationPlugin -
> changelog program - agmt="cn=meTobohdc-prod-ipa01.passur.local"
> (bohdc-prod-ipa01:389): CSN 58bdf8f5000200070000 not found, we aren't
> as up to date, or we purged
>
> This "could" also be a known issue that is fixed in newer versions of
> 389-ds-base.  Or this is a valid error message due to the replica being
> stale for a very long time and records actually being purged from the
> changelog before they were replicated.
>
> Mar  6 14:16:06 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:06.888679268 -0500] NSMMReplicationPlugin -
> agmt="cn=meTobohdc-prod-ipa01.passur.local" (bohdc-prod-ipa01:389):
> Data required to update replica has been purged from the changelog.
> The replica must be reinitialized.
> Mar  6 14:16:06 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:06.960804253 -0500] NSMMReplicationPlugin -
> agmt="cn=masterAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat"
> (orldc-prod-ipa01:389): The remote replica has a different database
> generation ID than the local database.  You may have to reinitialize
> the remote replica, or the local replica.
>
> Okay, so your replication agreements/servers are not in sync.  I suspect you
> created a new replica and used that to initialize a valid replica which
> broke things.  Something like that.  You need to find a "good" replica
> server and reinitialize the other replicas from that server.  These errors
> needs to addressed asap, as it's halting replication for those agreements
> which explains the "instability" you are describing.
>
> Mark
>
> Mar  6 14:16:08 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:08.960622608 -0500] attrlist_replace - attr_replace
> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
> failed.
> Mar  6 14:16:08 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:08.968927168 -0500] attrlist_replace - attr_replace
> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
> failed.
> Mar  6 14:16:08 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:08.976952118 -0500] attrlist_replace - attr_replace
> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
> failed.
> Mar  6 14:16:09 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:09.972315877 -0500] NSMMReplicationPlugin -
> agmt="cn=masterAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat"
> (orldc-prod-ipa01:389): The remote replica has a different database
> generation ID than the local database.  You may have to reinitialize
> the remote replica, or the local replica.
> Mar  6 14:16:10 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:10.034810948 -0500]
> agmt="cn=meTobohdc-prod-ipa01.passur.local" (bohdc-prod-ipa01:389) -
> Can't locate CSN 58bdf8f5000200070000 in the changelog (DB rc=-30988).
> If replication stops, the consumer may need to be reinitialized.
> Mar  6 14:16:10 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:10.040020359 -0500] NSMMReplicationPlugin -
> changelog program - agmt="cn=meTobohdc-prod-ipa01.passur.local"
> (bohdc-prod-ipa01:389): CSN 58bdf8f5000200070000 not found, we aren't
> as up to date, or we purged
> Mar  6 14:16:10 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:10.042846879 -0500] NSMMReplicationPlugin -
> agmt="cn=meTobohdc-prod-ipa01.passur.local" (bohdc-prod-ipa01:389):
> Data required to update replica has been purged from the changelog.
> The replica must be reinitialized.
> Mar  6 14:16:13 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:13.013253769 -0500] attrlist_replace - attr_replace
> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
> failed.
> Mar  6 14:16:13 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:13.021514225 -0500] attrlist_replace - attr_replace
> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
> failed.
> Mar  6 14:16:13 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:13.027521508 -0500] attrlist_replace - attr_replace
> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
> failed.
> Mar  6 14:16:13 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:13.110566247 -0500] NSMMReplicationPlugin -
> agmt="cn=masterAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat"
> (orldc-prod-ipa01:389): The remote replica has a different database
> generation ID than the local database.  You may have to reinitialize
> the remote replica, or the local replica.
> Mar  6 14:16:14 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:14.179819300 -0500]
> agmt="cn=meTobohdc-prod-ipa01.passur.local" (bohdc-prod-ipa01:389) -
> Can't locate CSN 58bdf8f5000200070000 in the changelog (DB rc=-30988).
> If replication stops, the consumer may need to be reinitialized.
> Mar  6 14:16:14 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:14.188353328 -0500] NSMMReplicationPlugin -
> changelog program - agmt="cn=meTobohdc-prod-ipa01.passur.local"
> (bohdc-prod-ipa01:389): CSN 58bdf8f5000200070000 not found, we aren't
> as up to date, or we purged
> Mar  6 14:16:14 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:14.196463928 -0500] NSMMReplicationPlugin -
> agmt="cn=meTobohdc-prod-ipa01.passur.local" (bohdc-prod-ipa01:389):
> Data required to update replica has been purged from the changelog.
> The replica must be reinitialized.
> Mar  6 14:16:17 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:17.068292919 -0500] attrlist_replace - attr_replace
> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
> failed.
> Mar  6 14:16:17 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:17.071241757 -0500] attrlist_replace - attr_replace
> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
> failed.
> Mar  6 14:16:17 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:17.073793922 -0500] attrlist_replace - attr_replace
> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
> failed.
> ------
>
>
> Thanks in advance!!!
>
> -- Chris
>
>

-- 
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project

Reply via email to