After a bunch more troubleshooting I finally have logs that are error free on 
all 4 servers :-)

I couldn't find anything really useful on Google about this particular error : 
attrlist_replace - attr_replace (nsslapd-referral, 
ldap:// failed

So I am going to write about my experiences fixing it.  There was a clue in a 
thread here :

But if you are like me and chose FreeIPA because you wanted to spend your time 
managing a lot of computers without worrying about the gorry technical details 
of 389 directory server, the answer given in that thread needs some explaining.

On every domain controller in your network run this command : 

ldapsearch -D "cn=directory manager" -W -b "o=ipaca" 

In the output on each server, look for the following key : It tells you the 
server's current ID :

nscpentrywsi: nsDS5ReplicaId: 1195

Now look for the ruv entries that look like this : 

nscpentrywsi: nsds50ruv: {replica 1195 ldap://dc1-ipa-dev-nvan.mydomain
 .net:389} 569afd7c000004ab0000 569b5b0e000004ab0000

Any of those ruvs that have an id number after the word replica need to be 
deleted if the number doesn't match the number of one of your servers.  They 
are old entries from previously deleted agreements.  Don't delete any that your 
servers identified themselves current as though, that will crash the server.  
Use the following ldap query to delete the old ones (where 21 in CLEANRUV21 is 
the id number of the agreement you want to delete) : 

ldapmodify -x -D "cn=directory manager" -W <<EOF
> dn: cn=replica,cn=o\3Dipaca,cn=mapping tree,cn=config
> changetype: modify
> replace: nsds5task
> nsds5task: CLEANRUV21

I noticed more strange behavior here because even after I deleted every old 
RUV, one of them came back all by itself.  I assumed it must be part of an 
agreement somewhere in the system and was getting re-created automatically so I 
went hunting for more info.  I noticed that the amount of unique servers listed 
in the error log message on each server uniquely matched the number of maxcsn 
entries in the ldap output of the tombstone search on each server.  The entries 
looked like this : 

nscpentrywsi: nsds5agmtmaxcsn: o=ipaca;;;389
nscpentrywsi: nsds5agmtmaxcsn: 

I could tell by looking at the unavailable it meant it was having trouble 
getting a csn number, but I didn't know how to delete them safely with ldap 
syntax.  Luckily, the new 4.3.0 interface calls these maxcsn entries segments.  
Removing them using the web ui is kind of round about, but works eventually.  
On each server, go to the web ui and one at a time delete and re-create all 
segments in the ca topology USING THE TEXT BASED ONE, NOT THE GRAPHIC ONE (this 
requires domain level 1).  The reason this works is because the command to 
delete a domain level segment also doubles as a command to clean local segments 
that are still in the old local part of the ldap tree from domain level 0. 

You still have to repeat it on each server (which is kind of funny because you 
are deleting the domain level objects multiple times, but only because you need 
to cause the local trigger on each server).

I noticed that after re-creation the names of the maxcsn entries in that ldap 
query result are much more uniform.  There are no 'masterAgreement' csn types, 
all member servers that are not the CA master have no entries at all, even 
after replication, and on the master, they are all labelled with the -to- 
syntax instead of the pki syntax.  I also noticed that some of my old invalid 
agreements had the same server name on both sides of the -to- and now they all 
perfectly match the segment names in the web ui.

I'm assuming all the bugs in 4.1.4 and 4.2.0 and 4.2.3 created a lot of garbage 

Luckily, with the tools in 4.3.0 those can all be removed.

I have now been staring at logs that have zero errors for over 30 minutes, and 
I was previously getting hundreds per second.

Although this is great news for me, it is not great news for anyone stuck on a 
CentOS or RHEL machine with no upgrade path to 4.3.0 without switching to 
Fedora who is experiencing the category of bugs (there were definitely multiple 
ones) that I encountered trying to fix these replication issues.

-----Original Message-----
From: Nathan Peters 
Sent: January-17-16 1:10 AM
To: Nathan Peters
Subject: RE: [Freeipa-users] Replication failing on FreeIPA 4.2.0

After some amount of work, I was able to get my system back to a state where it 
seems to be replicating ok, but not with FreeIPA 4.2.0.  Because this was a 
production system with several hundred users and computers attached to it, a 
wipe of the domain was not an option so I decided to chance that the new 
replication topology features would help.

I replaced each CentOS 7 domain controller with a Fedora 23 FreeIPA 4.2.3 host 
and while doing so I noticed an odd behavior of the RUVs.  I know about the 
current bug where deleting a replica doesn't delete its RUV and I experienced 
that. I would run a command like this :

dn: cn=clean 4, cn=cleanallruv, cn=tasks, cn=config
objectclass: top
objectclass: extensibleObject
replica-base-dn: dc=mydomain,dc=net
replica-id: 4
replica-force-cleaning: yes
cn: clean 4

It would fail only if I was not in a current agreement with the new Fedora RUV 
for that host.  Ie, if the old CentOS host had a RUV of 4, and the new Fedora 
host 15, and I was in an agreement with 15, that ldap code would delete 4, but 
if I was not in an agreement with 15, it would fail.

After A while I had every server in an agreement with all others and got all 
the old RUVs cleared.

I was still experiencing strange error messages in my logs with FreeIPA 4.2.3 
so I decided to go all the way to 4.3.0.

Here are the 4.2.3 errors :

[16/Jan/2016:22:29:12 -0800] NSMMReplicationPlugin - 
replica_replace_ruv_tombstone: failed to update replication update vector for 
replica dc=mydomain,dc=net: LDAP error - 53
[16/Jan/2016:22:29:13 -0800] NSMMReplicationPlugin - agmt_delete: begin
[16/Jan/2016:22:32:51 -0800] slapi_ldap_bind - Error: could not bind id 
[cn=Replication Manager,ou=csusers,cn=config] 
authentication mechanism [SIMPLE]: error 32 (No such object) errno 0 (Success)

On 4 servers, 3 upgrades to 4.3.0 went smooth, and 1 just hung during the %post 
section of the dnf install for an hour with ns-lapd process taking 100% cpu on 
all 4 cores until I stopped it.  A subsequent ipa-server-upgrade fixed 

With the new replication topology management graphs and controls in the ui, I 
was able to find some missing segments and replace some that were for some 
reason only 1 way.

Replication seems to actually be proceeding smoothly and now instead of getting 
the hundreds of error log entries per second that I had reported in my earlier 
posts, I am only getting about 3 every 5 minutes.  The bugs that were present 
in 4.2.0 and 4.2.3 seem to be almost entirely gone.

I have ran the new topology suffix verification commands and they say 
everything is ok.

I still get these errors in batches of 3, but they don't seem to be doing 
anything harmful in terms of my systems ability to operating and replicate 
properly :

[17/Jan/2016:01:07:27 -0800] attrlist_replace - attr_replace (nsslapd-referral, 
ldap:// failed.

-----Original Message-----
[] On Behalf Of Nathan Peters
Sent: January-15-16 10:00 AM
To: Ludwig Krispenz
Subject: Re: [Freeipa-users] Replication failing on FreeIPA 4.2.0

No dice on the rebuild and RUV cleaning. I'm still getting a pile of these on 
dc1-van : 

[15/Jan/2016:17:55:25 +0000] NSMMReplicationPlugin - 
agmt="" (dc1-ipa-dev-nvan:389): Skipping 
update operation with no message_id (uniqueid 
6e6784a0-b5c911e5-b1f1cd78-f19552bb, CSN 569932db000000040000):

I'm also getting these on dc1-nvan: 

[15/Jan/2016:17:45:36 +0000] attrlist_replace - attr_replace (nsslapd-referral, 
ldap:// failed.

-----Original Message-----
From: Ludwig Krispenz [] 
Sent: January-15-16 12:19 AM
To: Nathan Peters
Cc: Rob Crittenden;
Subject: Re: [Freeipa-users] Replication failing on FreeIPA 4.2.0

On 01/15/2016 08:32 AM, Nathan Peters wrote:
> I think I've finally started to make some progress on this.  I did a lot of 
> googling and found some stuff to run manually in 389 ds through ldapmodify 
> commands to clean RUVs.  During this process the server crashed and when it 
> came back online, suddenly all my ghost RUVs were visible through 
> ipa-replica-manage list-ruv.  It was really strange, I had like 5 of them 
> from winsync agreements that kept failing and needing re-initialization, and 
> another 5 from my earlier re-installations of the 2 other domain controllers.
> I ran some more ruv cleanup commands through ldap and they all appear to be 
> gone.  I'm not sure how the crash suddenly made them visible though or why 
> they had to be cleaned through ldapmodify directly and ipa-replica-manage 
> could neither see nor clean them.
After a crash the RUV could be rebuilt from the changelog, and the changelog 
could contain references to cleaned ReplicaIds and so they came to live again. 
The cleanallruv task was enhanced to also clean the changelog, but this fix is 

Manage your subscription for the Freeipa-users mailing list:
Go to for more info on the project

Manage your subscription for the Freeipa-users mailing list:
Go to for more info on the project

Reply via email to