Hi Rich,
Thanks a lot for your response. Please find the sample reproducer details
below. I am not sure about how to log a bug. I will explore and do it.
Reproducer:
Step-1:
Have a topology like Master replicating to Slave and Slave replication to
consumer.
Master -> Slave-> Consumer.
Step-2:
Make sure that all are on sync at this time. Let's take an example all are the
on sync up to CSN5 (5 records are added to master from CSN1 to CSN5).
Step-3:
Delete the replication agreement from Master to Slave and also from Slave to
consumer.
Step-4:
Promote the Slave to master. Promotion steps are given below.
- Delete Supplier DN (cn=suppdn,cn=config) from Slave
- Delete "cn=replica" entry for the suffix "o=USA" using ldapmodify.
As a result, it will delete the changelog file.
Ex: dn: cn=replica,cn=o=USA,cn=mapping tree,cn=config
changetype: delete
- Modify the cn=o=USA ,cn=mapping tree,cn=config entry as below
EX: dn: cn=o=USA,cn=mapping tree,cn=config
changetype: modify
replace: nsslapd-state
nsslapd-state: backend
dn: cn=o=USA,cn=mapping tree,cn=config
changetype: modify
delete: nsslapd-referral
- Recreate the "cn=replica" entry for the suffix as below.
dn: cn=replica,cn=o=SWIFT,cn=mapping tree,cn=config
changetype: add
objectClass: nsds5replica
objectClass: top
nsDS5ReplicaRoot: o=SWIFT
nsDS5ReplicaType: 3
nsDS5Flags: 1
nsDS5ReplicaId: 10 ----> Please assign the same "nsDS5ReplicaId value what
master was having. In my case, Original master replica ID was 10.
nsds5ReplicaPurgeDelay: 1
nsds5ReplicaTombstonePurgeInterval: -1
cn: replica
- Restart slapd process. Now Slave become Master.
Is there anything am I missing during promotion operation or it's not the right
way to do the promotion operation?
Step -5:
Add the replication agreement between Slave(newly promoted Master) and Consumer
. At this time both Slave and consumer are on sync up to CSN5. During agreement
creation please do not initialize the consumer.
Slave(newly promoted as master) - > consumer.
Step-6:
Add another 5 more entries to Slave which was promoted above as Master. Let's
assume CSN numbers for these 5 entries are from CSN6 to CSN10.
Step-7:
Now, you will see, among the last 5 entries only last few will gets replicated
without halting the replication.
Regards,
Jyoti
From: Rich Megginson [mailto:[email protected]]
Sent: Friday, October 28, 2011 10:54 PM
To: General discussion list for the 389 Directory server project.
Cc: Das, Jyoti Ranjan (STSD)
Subject: Re: [389-users] Data inconsitency during replication
On 10/20/2011 12:45 AM, Das, Jyoti Ranjan (STSD) wrote:
Hi,
I am new to 389 directory server. Could you please help me in the below
mentioned query?
Thank you very much in advance.
Problem statement:
Data loss during the replication between Supplier and consumer when master
changelog db file is being deleted due to some reason , consumer is imported
with some stale data and consumer doesn't want initialization during the new
replication agreement. The test scenario is given below.
Test scenario:
Steps:
Topology
Supplier -----------Replication agreement-----------------> Hub
Both replicas are in sync at this time as mentioned below.
Let's take this sample example: Five entries has been added starting from CSN1
to CSN5
Take a db2ldif with "-r" option from the Hub replica.
Add another 5 entries in the supplier. Let's take their CSN numbers are
starting from CSN6 to CSN10
Delete the replication agreements
Before or after CSN6 to CSN10 have been replicated to the Hub?
Delete the master changelog db file from the changelogdb directory.
Supplier or Hub?
Add another 5 entries in the supplier. Let's take their CSN numbers are staring
from CSN11 to CSN15
Import the ldif file taken in Step-2 in the Hub replica( it's a
initialization of consumer with the stale data)
Create the replication agreement between master and hub with the "do not
initialize" option.
Now we will see the data loss starting from CSN6 to CSN14. Only entry with
CSN15 will be replicated to the consumer and also will continue further with
successful replication
Questions:
Is this a correct approach in this scenario to continue with replication even
if there are data losses instead of halting the replication?
>From the code analysis:
File: " ldapserver/ldap/servers/plugins/replication/cl5_api.c"
If the requested CSN number is now found in the changelog db file and also not
there in the purge list, it makes the following assumption and continues with
replication
/* there is a special case which can occur just after migration - in this case,
the consumer RUV will contain the last state of the supplier before migration,
but the supplier will have an empty changelog, or the supplier changelog will
not contain any entries within the consumer min and max CSN - also, since
the purge RUV contains no CSNs, the changelog has never been purged
ASSUMPTIONS - it is assumed that the supplier had no pending changes to send
to any consumers; that is, we can assume that no changes were lost due to
either changelog purging or database reload - bug# 603061 -
[email protected]<mailto:[email protected]> */
Is it a correct approach in this scenario to halt the
replication with a fatal error message in the error log file?
Probably, but then this code would have to be a lot smarter to figure out that
the problem is due to stale data being imported into the consumer. Please file
a bug with exact steps to reproduce this problem.
Regards,
Jyoti
--
389 users mailing list
[email protected]<mailto:[email protected]>
https://admin.fedoraproject.org/mailman/listinfo/389-users
--
389 users mailing list
[email protected]
https://admin.fedoraproject.org/mailman/listinfo/389-users