On 04/13/2012 03:40 PM, Dan Scott wrote:
On Fri, Apr 13, 2012 at 16:41, Rich Megginson<rmegg...@redhat.com>  wrote:
On 04/13/2012 02:30 PM, Dan Scott wrote:
On Fri, Apr 13, 2012 at 15:24, Rich Megginson<rmegg...@redhat.com>    wrote:
It's not a problem until it's a problem :-)  I would go ahead and run
I cleaned up a load of these entries, but now I think I've broken the
replication between fileserver1 and 3:

[13/Apr/2012:15:57:56 -0400] NSMMReplicationPlugin - changelog program
- agmt="cn=meTofileserver3.ecg.mit.edu" (fileserver3:389): CSN
4f5039960000002b0000 not found, we aren't as up to date, or we purged
[13/Apr/2012:15:57:56 -0400] NSMMReplicationPlugin -
agmt="cn=meTofileserver3.ecg.mit.edu" (fileserver3:389): Data required
to update replica has been purged. The replica must be reinitialized.
[13/Apr/2012:15:57:56 -0400] NSMMReplicationPlugin -
agmt="cn=meTofileserver3.ecg.mit.edu" (fileserver3:389): Incremental
update failed and requires administrator action

[13/Apr/2012:16:19:38 -0400] NSMMReplicationPlugin - changelog program
- agmt="cn=meTofileserver1.ecg.mit.edu" (fileserver1:389): CSN
4f031e76001d000b0000 not found, we aren't as up to date, or we purged

Is it safe to run:
[root@fileserver3 ~]# ipa-replica-manage re-initialize --from

I want to make sure I get it the correct way round!

Are you sure that fileserver1 has the correct data?
Maybe? :) I've snapshotted both VMs and re-initialized from
fileserver1 - looking good so far.

I cleaned up all the "ruv_compare_ruv: RUV [changelog max RUV] does
not contain element" errors in the logs for each of fileservers 1, 2
and 3. The ldapsearch for
is still showing entries though. Is that OK?

The entry should exist, but the deleted servers should not be present in the nsds50ruv attribute.

Also, the PKI-CA error logs are showing RUV errors, should I clean
those too? I guess that I need to modify the commands (-b o=ipaca -p
7389 -h localhost).


fileserver3's /var/log/dirsrv/slapd-PKI-IPA/errors contains lots of:
[13/Apr/2012:13:52:50 -0400] slapi_ldap_bind - Error: could not send
startTLS request: error -1 (Can't contact LDAP server) errno 107
(Transport endpoint is not connected)

This is a real connection error - could be cert or hostname lookup
How do I find out if it's cert or hostname lookup? Which hostname?
Fileserver3 runs DNS, and it seems to be working fine.

Try ldapsearch - on server3

LDAPTLS_CACERTDIR=/etc/dirsrv/slapd-PKI-IPA ldapsearch -x -ZZ -H
ldap://server2.fqdn -D "cn=directory manager" -W -s base -b ""

If that works, check to make sure the replication agreement has the

If that doesn't work, use ldapsearch -d 1 -x ..... to get further
The replication agreements (according to ipa-replica-manage) all have
the correct host names - I'm not sure what ldapsearch command to run
to check the replication agreements.

ipa-replica-manage --list?  or something like that?
That's what I was using - they are all correct.

Ok. And the LDAPTLS_CACERTDIR=/etc/dirsrv/slapd-PKI-IPA ldapsearch ... is working?

The /var/log/dirsrv/slapd-ECG-MIT-EDU/errors is
now full of:

[13/Apr/2012:14:59:19 -0400] NSMMReplicationPlugin - conn=1 op=571
csn=4f70a9e5000100060000: Can't created glue entry
uniqueid=6949d104-775b11e1-abce82a1-a45dd3c3, error 68

Should I delete the LDAP entry which is trying to replicate
fileserver2 with fileserver4?

Yes.  And it may be due to the fact that the entry it is trying to delete
has those tombstone children that have to be deleted too.
OK, I'll see how this goes, once the tombstones are gone.
The tombstones for ECG-MIT-EDU are gone now, still receiving this
message in the logs.

I think that's enough for this week - I'll look into it more next
week. Thanks for your help, have a good weekend.


