Re: [389-users] Data inconsitency during replication

Das, Jyoti Ranjan (STSD) Mon, 31 Oct 2011 02:09:00 -0700

Hi Rich,

Thanks a lot for your response. Please find the sample reproducer details 
below. I am not sure about how to log a bug. I will explore and do it.



Reproducer:


Step-1:

Have a topology like Master replicating to Slave and Slave replication to 
consumer.

Master -> Slave-> Consumer.

Step-2:
Make sure that all are on sync at this time. Let's take an example all are the 
on sync up to CSN5 (5 records are added to master from CSN1 to CSN5).

Step-3:

Delete the replication agreement from Master to Slave and also from Slave to 
consumer.

Step-4:

Promote the Slave to master.  Promotion steps are given below.


-          Delete Supplier DN (cn=suppdn,cn=config) from Slave

-          Delete "cn=replica" entry for the suffix "o=USA" using ldapmodify. 
As a result, it will delete the changelog file.

Ex: dn: cn=replica,cn=o=USA,cn=mapping tree,cn=config

changetype: delete

-          Modify the cn=o=USA ,cn=mapping tree,cn=config entry as below

EX: dn: cn=o=USA,cn=mapping tree,cn=config

changetype: modify

replace: nsslapd-state

nsslapd-state: backend



dn: cn=o=USA,cn=mapping tree,cn=config

changetype: modify

delete: nsslapd-referral

-          Recreate the "cn=replica" entry for the suffix as below.

dn: cn=replica,cn=o=SWIFT,cn=mapping tree,cn=config

changetype: add

objectClass: nsds5replica

objectClass: top

nsDS5ReplicaRoot: o=SWIFT

nsDS5ReplicaType: 3

nsDS5Flags: 1

nsDS5ReplicaId: 10  ----> Please assign the same "nsDS5ReplicaId value what 
master was having. In my case, Original master replica ID was 10.

nsds5ReplicaPurgeDelay: 1

nsds5ReplicaTombstonePurgeInterval: -1

cn: replica

-          Restart  slapd process. Now Slave become Master.

Is there anything am I missing during promotion operation or it's not the right 
way to do the promotion operation?

Step -5:

Add the replication agreement between Slave(newly promoted Master) and Consumer 
. At this time both Slave and consumer are on sync up to CSN5. During agreement 
creation please do not initialize the consumer.

           Slave(newly promoted as master) - > consumer.

Step-6:

Add another 5 more entries to Slave which was promoted above as Master. Let's 
assume CSN numbers for these 5 entries are from CSN6 to CSN10.

Step-7:

Now, you will see, among the last 5 entries only last few will gets replicated 
without halting the replication.


Regards,
Jyoti





From: Rich Megginson [mailto:[email protected]]
Sent: Friday, October 28, 2011 10:54 PM
To: General discussion list for the 389 Directory server project.
Cc: Das, Jyoti Ranjan (STSD)
Subject: Re: [389-users] Data inconsitency during replication

On 10/20/2011 12:45 AM, Das, Jyoti Ranjan (STSD) wrote:
Hi,

I am new to 389 directory server. Could you please help me in the below 
mentioned query?
Thank you very much in advance.

Problem statement:

Data loss during the replication between Supplier and consumer when master 
changelog db file is being deleted due to some reason , consumer is imported 
with some stale data and consumer doesn't want initialization during the new 
replication agreement. The test scenario is given below.

Test scenario:
Steps:

Topology

Supplier -----------Replication agreement-----------------> Hub

Both replicas are in sync at this time as mentioned below.

Let's take this sample example: Five entries has been added starting from CSN1 
to CSN5

Take a db2ldif with "-r" option from the Hub replica.

Add another 5 entries in the supplier. Let's take their CSN numbers are 
starting from CSN6 to CSN10

Delete the replication agreements
Before or after CSN6 to CSN10 have been replicated to the Hub?


Delete the master changelog db file from the changelogdb directory.
Supplier or Hub?


Add another 5 entries in the supplier. Let's take their CSN numbers are staring 
 from CSN11 to CSN15

Import the ldif file  taken in Step-2 in the Hub replica(  it's a 
initialization of consumer with the stale data)

Create the replication agreement between master and hub with the "do not 
initialize" option.

Now we will see the data loss starting from CSN6 to CSN14. Only entry with 
CSN15 will be replicated to the consumer and also will continue further with 
successful replication



Questions:

Is this a correct approach in this scenario to continue with replication even 
if there are data losses instead of halting the replication?

>From the code analysis:

File: " ldapserver/ldap/servers/plugins/replication/cl5_api.c"

If the requested CSN number is now found in the changelog db file and also not 
there in the purge list, it makes the following assumption and continues with 
replication



/* there is a special case which can occur just after migration - in this case,

  the consumer RUV will contain the last state of the supplier before migration,

  but the supplier will have an empty changelog, or the supplier changelog will

  not contain any entries within the consumer min and max CSN - also, since

  the purge RUV contains no CSNs, the changelog has never been purged

  ASSUMPTIONS - it is assumed that the supplier had no pending changes to send

  to any consumers; that is, we can assume that no changes were lost due to

  either changelog purging or database reload - bug# 603061 - 
[email protected]<mailto:[email protected]> */


                 Is it a correct approach in this scenario to halt the 
replication with a fatal error message in the error log file?
Probably, but then this code would have to be a lot smarter to figure out that 
the problem is due to stale data being imported into the consumer.  Please file 
a bug with exact steps to reproduce this problem.




Regards,
Jyoti









--

389 users mailing list

[email protected]<mailto:[email protected]>

https://admin.fedoraproject.org/mailman/listinfo/389-users

--
389 users mailing list
[email protected]
https://admin.fedoraproject.org/mailman/listinfo/389-users

Re: [389-users] Data inconsitency during replication

Reply via email to