Dear all,

replying to my own question ;-)


this document explains the rbd mirroring / journaling process more in details: 
https://pad.ceph.com/p/I-rbd_mirroring


especially this part:
on startup, replay journal from flush position
Store journal metadata in journal header, to be more general

  *   flush position

  *   per-zone flush positions

pointers to positions in the journal (object, offset)
- one for each reader so we can tell how far we can trim
- store trim pos in primary and secondary zones, so despite loss of primary dc 
we can tell who's most up to date
=> so apparently there is one pointer to position in the journal for each 
secondary images (journal reader) and also importantly one for the primary 
image (normally journal writer, but also reader during open / crash recovery)
this apparently confirms that clients on the primary are not only writing to 
the journal (to support replication on secondary) but also actively reading 
from it after a crash to replay the latest IO's that were missing on primary 
image.


also useful info: https://tracker.ceph.com/projects/ceph/wiki/RBD_-_Mirroring
  *
  *   on open, replay recent journal operations
  *   periodically update a journal position pointer in the rbd image header 
(to limit replays on open)

and this: https://docs.ceph.com/en/pacific/rbd/rbd-mirroring/#force-image-resync
If a split-brain event is detected by the rbd-mirror daemon, it will not 
attempt to mirror the affected image until corrected.

cheers
Francois Scheurer




--


EveryWare AG
François Scheurer
Senior Systems Engineer
Zurlindenstrasse 52a
CH-8003 Zürich

tel: +41 44 466 60 00
fax: +41 44 466 60 10
mail: francois.scheu...@everyware.ch
web: http://www.everyware.ch
________________________________
From: Scheurer François <francois.scheu...@everyware.ch>
Sent: Tuesday, October 3, 2023 4:38:07 PM
To: dilla...@redhat.com; ceph-users@ceph.io
Subject: [ceph-users] is the rbd mirror journal replayed on primary after a 
crash?


Hello



Short question regarding journal-based rbd mirroring.


▪IO path with journaling w/o cache:

a. Create an event to describe the update
b. Asynchronously append event to journal object
c. Asynchronously update image once event is safe
d. Complete IO to client once update is safe


[cf. 
https://events.static.linuxfound.org/sites/events/files/slides/Disaster%20Recovery%20and%20Ceph%20Block%20Storage-%20Introducing%20Multi-Site%20Mirroring_0.pdf]


If a client crashes between b. and c., is there a mechanism to replay the IO 
from the journal on the primary image?

If not, then the primary and secondary images would get out-of-sync (because of 
the extra write(s) on secondary) and subsequent writes to the primary would 
corrupt the secondary. Is that correct?



Cheers

Francois Scheurer




--


EveryWare AG
François Scheurer
Senior Systems Engineer
Zurlindenstrasse 52a
CH-8003 Zürich

tel: +41 44 466 60 00
fax: +41 44 466 60 10
mail: francois.scheu...@everyware.ch
web: http://www.everyware.ch

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to