[ceph-users] Safe method to perform failback for RBD on one way mirroring.

Saif Mohammad Mon, 27 May 2024 08:25:40 -0700

Hello Everyone

We have Clusters in production with the following configuration:
Cluster-A :  quincy v17.2.5
Cluster-B :  quincy v17.2.5
All images in a pool have the snapshot feature enabled and are mirrored.
Each site has 3 daemons.


We're testing disaster recovery with one-way mirroring in our block device 
mirroring setup. On the primary site (Cluster-A), we have Ceph clients attached 
to it , and couple of images are present there. These images are replicated to 
the secondary site (Cluster-B). 
During testing we've successfully conducted failovers, with all resources 
accessible from Cluster-B once the Ceph client is attached there on secondary 
site.

However, during failback (restoring the primary site), we've encountered an 
issue. Data that was pushed from the secondary site seems to be deleted, while 
data originally present only on the primary site remains intact. Here are the 
steps we took during failback:
- Detached the client from Cluster-B.
- Ensured that on Cluster-B, "mirroring primary" is set to true, and on 
Cluster-A, it's set to false.
- Demoted the images on Cluster-B and promoted the images on Cluster-A.

After performing these steps, our images went into an error state, and they 
started syncing from Cluster-A to Cluster-B. However, in a failback scenario, 
the direction should be from Cluster-B to Cluster-A.

We are not sure where we are making mistakes. Could anybody please advise on 
the correct procedure for failback in one-way mirroring and the safest way to 
execute it to avoid impacting our data.

Regards,
Mohammad Saif
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Safe method to perform failback for RBD on one way mirroring.

Reply via email to