After some amount of work, I was able to get my system back to a state where it 
seems to be replicating ok, but not with FreeIPA 4.2.0.  Because this was a 
production system with several hundred users and computers attached to it, a 
wipe of the domain was not an option so I decided to chance that the new 
replication topology features would help.

I replaced each CentOS 7 domain controller with a Fedora 23 FreeIPA 4.2.3 host 
and while doing so I noticed an odd behavior of the RUVs.  I know about the 
current bug where deleting a replica doesn't delete its RUV and I experienced 
that. I would run a command like this :

dn: cn=clean 4, cn=cleanallruv, cn=tasks, cn=config
objectclass: top
objectclass: extensibleObject
replica-base-dn: dc=mydomain,dc=net
replica-id: 4
replica-force-cleaning: yes
cn: clean 4

It would fail only if I was not in a current agreement with the new Fedora RUV 
for that host.  Ie, if the old CentOS host had a RUV of 4, and the new Fedora 
host 15, and I was in an agreement with 15, that ldap code would delete 4, but 
if I was not in an agreement with 15, it would fail.

After A while I had every server in an agreement with all others and got all 
the old RUVs cleared.

I was still experiencing strange error messages in my logs with FreeIPA 4.2.3 
so I decided to go all the way to 4.3.0.

Here are the 4.2.3 errors :

[16/Jan/2016:22:29:12 -0800] NSMMReplicationPlugin - 
replica_replace_ruv_tombstone: failed to update replication update vector for 
replica dc=mydomain,dc=net: LDAP error - 53
[16/Jan/2016:22:29:13 -0800] NSMMReplicationPlugin - agmt_delete: begin
[16/Jan/2016:22:32:51 -0800] slapi_ldap_bind - Error: could not bind id 
[cn=Replication Manager 
masterAgreement1-dc2-ipa-dev-van.mydomain.net-pki-tomcat,ou=csusers,cn=config] 
authentication mechanism [SIMPLE]: error 32 (No such object) errno 0 (Success)

On 4 servers, 3 upgrades to 4.3.0 went smooth, and 1 just hung during the %post 
section of the dnf install for an hour with ns-lapd process taking 100% cpu on 
all 4 cores until I stopped it.  A subsequent ipa-server-upgrade fixed 
everything.

With the new replication topology management graphs and controls in the ui, I 
was able to find some missing segments and replace some that were for some 
reason only 1 way.

Replication seems to actually be proceeding smoothly and now instead of getting 
the hundreds of error log entries per second that I had reported in my earlier 
posts, I am only getting about 3 every 5 minutes.  The bugs that were present 
in 4.2.0 and 4.2.3 seem to be almost entirely gone.

I have ran the new topology suffix verification commands and they say 
everything is ok.

I still get these errors in batches of 3, but they don't seem to be doing 
anything harmful in terms of my systems ability to operating and replicate 
properly :

[17/Jan/2016:01:07:27 -0800] attrlist_replace - attr_replace (nsslapd-referral, 
ldap://dc1-ipa-dev-nvan.mydomain.net:389/o%3Dipaca) failed.

-----Original Message-----
From: freeipa-users-boun...@redhat.com 
[mailto:freeipa-users-boun...@redhat.com] On Behalf Of Nathan Peters
Sent: January-15-16 10:00 AM
To: Ludwig Krispenz
Cc: freeipa-users@redhat.com
Subject: Re: [Freeipa-users] Replication failing on FreeIPA 4.2.0

No dice on the rebuild and RUV cleaning. I'm still getting a pile of these on 
dc1-van : 

[15/Jan/2016:17:55:25 +0000] NSMMReplicationPlugin - 
agmt="cn=meTodc1-ipa-dev-nvan.mydomain.net" (dc1-ipa-dev-nvan:389): Skipping 
update operation with no message_id (uniqueid 
6e6784a0-b5c911e5-b1f1cd78-f19552bb, CSN 569932db000000040000):

I'm also getting these on dc1-nvan: 

[15/Jan/2016:17:45:36 +0000] attrlist_replace - attr_replace (nsslapd-referral, 
ldap://dc1-ipa-dev-van.mydomain.net:389/o%3Dipaca) failed.




-----Original Message-----
From: Ludwig Krispenz [mailto:lkris...@redhat.com] 
Sent: January-15-16 12:19 AM
To: Nathan Peters
Cc: Rob Crittenden; freeipa-users@redhat.com
Subject: Re: [Freeipa-users] Replication failing on FreeIPA 4.2.0


On 01/15/2016 08:32 AM, Nathan Peters wrote:
> I think I've finally started to make some progress on this.  I did a lot of 
> googling and found some stuff to run manually in 389 ds through ldapmodify 
> commands to clean RUVs.  During this process the server crashed and when it 
> came back online, suddenly all my ghost RUVs were visible through 
> ipa-replica-manage list-ruv.  It was really strange, I had like 5 of them 
> from winsync agreements that kept failing and needing re-initialization, and 
> another 5 from my earlier re-installations of the 2 other domain controllers.
>
> I ran some more ruv cleanup commands through ldap and they all appear to be 
> gone.  I'm not sure how the crash suddenly made them visible though or why 
> they had to be cleaned through ldapmodify directly and ipa-replica-manage 
> could neither see nor clean them.
After a crash the RUV could be rebuilt from the changelog, and the changelog 
could contain references to cleaned ReplicaIds and so they came to live again. 
The cleanallruv task was enhanced to also clean the changelog, but this fix is 
in 1.3.4.2+.

-- 
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project

-- 
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project

Reply via email to