On 04 May 2014, at 9:35 PM, David Boreham <[email protected]> wrote:

> It should be possible to add an N+1th replica to an N-node deployment. 
> Replication agreements are peer-to-peer, so you just add a new replication 
> agreement from each of the servers you want to feed changes to the N+1th 
> (typically all of them).

What I've learned so far:

- servera has "syntax checking" switched off, and contains data with syntax 
errors. The data is 15 years old.
- serverb has "syntax checking" switched on, but has successfully been able to 
replicate in the past. Now replication is broken with serverb.
- serverc has "syntax checking" switched on, and has never been able to 
replicate. Serverc is brand new.

What appears to be happening is that during the replication process, an LDAP 
operation that is accepted on servera is being rejected by serverc. The 
replication process is brittle, and has not been coded to handle any kind of 
error during the replication process, and so fails abruptly with "ERROR bulk 
import abandoned" and no further explanation. The error that triggered the 
abort is only visible by turning trace logging on.

This abrupt failure of replication is ignored by both servera and serverc, 
which goes into it's normal incremental update state. Because the initial 
initialize has failed, you get the following two messages:

[05/May/2014:11:19:01 +0200] NSMMReplicationPlugin - 
replica_replace_ruv_tombstone: failed to update replication update vector for 
replica o=Foo,c=ZA: LDAP error - 1

[05/May/2014:04:25:31 -0500] NSMMReplicationPlugin - agmt="cn=Agreement 
serverc.example.com" (serverc:636): Replica has a different generation ID than 
the local data.

When you google for the above messages (which is where I started) you get the 
advice "initialize the supplier/consumer/hub", which throws you because 
"initialize the supplier/consumer/hub" was the exact task you were trying to do.

I have still not been able to get serverc to initialize from servera, I will 
continue in the trace logging to see what the next error is. It would appear 
that 389ds replication has become more strict, and older versions of 389ds 
allowed things through that newer versions no longer do. As a result older 
directories are at risk of sudden failure as latent mistakes that weren't fatal 
now are.

Regards,
Graham
--

--
389 users mailing list
[email protected]
https://admin.fedoraproject.org/mailman/listinfo/389-users

Reply via email to