I was able to repeat this issue locally by restarting the primary OSD for the "rbd_mirroring" object. It seems that a regression was introduced w/ the introduction of Ceph msgr2 in that upon reconnect, the connection type for the client switches from ANY to V2 -- but only for the watcher session and not the status updates. I've opened a tracker ticker for this issue [1].
Thanks. On Fri, Sep 13, 2019 at 12:44 PM Oliver Freyermuth <freyerm...@physik.uni-bonn.de> wrote: > > Am 13.09.19 um 18:38 schrieb Jason Dillaman: > > On Fri, Sep 13, 2019 at 11:30 AM Oliver Freyermuth > > <freyerm...@physik.uni-bonn.de> wrote: > >> > >> Am 13.09.19 um 17:18 schrieb Jason Dillaman: > >>> On Fri, Sep 13, 2019 at 10:41 AM Oliver Freyermuth > >>> <freyerm...@physik.uni-bonn.de> wrote: > >>>> > >>>> Am 13.09.19 um 16:30 schrieb Jason Dillaman: > >>>>> On Fri, Sep 13, 2019 at 10:17 AM Jason Dillaman <jdill...@redhat.com> > >>>>> wrote: > >>>>>> > >>>>>> On Fri, Sep 13, 2019 at 10:02 AM Oliver Freyermuth > >>>>>> <freyerm...@physik.uni-bonn.de> wrote: > >>>>>>> > >>>>>>> Dear Jason, > >>>>>>> > >>>>>>> thanks for the very detailed explanation! This was very instructive. > >>>>>>> Sadly, the watchers look correct - see details inline. > >>>>>>> > >>>>>>> Am 13.09.19 um 15:02 schrieb Jason Dillaman: > >>>>>>>> On Thu, Sep 12, 2019 at 9:55 PM Oliver Freyermuth > >>>>>>>> <freyerm...@physik.uni-bonn.de> wrote: > >>>>>>>>> > >>>>>>>>> Dear Jason, > >>>>>>>>> > >>>>>>>>> thanks for taking care and developing a patch so quickly! > >>>>>>>>> > >>>>>>>>> I have another strange observation to share. In our test setup, > >>>>>>>>> only a single RBD mirroring daemon is running for 51 images. > >>>>>>>>> It works fine with a constant stream of 1-2 MB/s, but at some point > >>>>>>>>> after roughly 20 hours, _all_ images go to this interesting state: > >>>>>>>>> ----------------------------------------- > >>>>>>>>> # rbd mirror image status test-vm.XXXXX-disk2 > >>>>>>>>> test-vm.XXXXX-disk2: > >>>>>>>>> global_id: XXXXXXXXXXXXXXX > >>>>>>>>> state: down+replaying > >>>>>>>>> description: replaying, master_position=[object_number=14, > >>>>>>>>> tag_tid=6, entry_tid=6338], mirror_position=[object_number=14, > >>>>>>>>> tag_tid=6, entry_tid=6338], entries_behind_master=0 > >>>>>>>>> last_update: 2019-09-13 03:45:43 > >>>>>>>>> ----------------------------------------- > >>>>>>>>> Running this command several times, I see entry_tid increasing at > >>>>>>>>> both ends, so mirroring seems to be working just fine. > >>>>>>>>> > >>>>>>>>> However: > >>>>>>>>> ----------------------------------------- > >>>>>>>>> # rbd mirror pool status > >>>>>>>>> health: WARNING > >>>>>>>>> images: 51 total > >>>>>>>>> 51 unknown > >>>>>>>>> ----------------------------------------- > >>>>>>>>> The health warning is not visible in the dashboard (also not in the > >>>>>>>>> mirroring menu), the daemon still seems to be running, dropped > >>>>>>>>> nothing in the logs, > >>>>>>>>> and claims to be "ok" in the dashboard - it's only that all images > >>>>>>>>> show up in unknown state even though all seems to be working fine. > >>>>>>>>> > >>>>>>>>> Any idea on how to debug this? > >>>>>>>>> When I restart the rbd-mirror service, all images come back as > >>>>>>>>> green. I already encountered this twice in 3 days. > >>>>>>>> > >>>>>>>> The dashboard relies on the rbd-mirror daemon to provide it errors > >>>>>>>> and > >>>>>>>> warnings. You can see the status reported by rbd-mirror by running > >>>>>>>> "ceph service status": > >>>>>>>> > >>>>>>>> $ ceph service status > >>>>>>>> { > >>>>>>>> "rbd-mirror": { > >>>>>>>> "4152": { > >>>>>>>> "status_stamp": "2019-09-13T08:58:41.937491-0400", > >>>>>>>> "last_beacon": "2019-09-13T08:58:41.937491-0400", > >>>>>>>> "status": { > >>>>>>>> "json": > >>>>>>>> "{\"1\":{\"name\":\"mirror\",\"callouts\":{},\"image_assigned_count\":1,\"image_error_count\":0,\"image_local_count\":1,\"image_remote_count\":1,\"image_warning_count\":0,\"instance_id\":\"4154\",\"leader\":true},\"2\":{\"name\":\"mirror_parent\",\"callouts\":{},\"image_assigned_count\":0,\"image_error_count\":0,\"image_local_count\":0,\"image_remote_count\":0,\"image_warning_count\":0,\"instance_id\":\"4156\",\"leader\":true}}" > >>>>>>>> } > >>>>>>>> } > >>>>>>>> } > >>>>>>>> } > >>>>>>>> > >>>>>>>> In your case, most likely it seems like rbd-mirror thinks all is good > >>>>>>>> with the world so it's not reporting any errors. > >>>>>>> > >>>>>>> This is indeed the case: > >>>>>>> > >>>>>>> # ceph service status > >>>>>>> { > >>>>>>> "rbd-mirror": { > >>>>>>> "84243": { > >>>>>>> "status_stamp": "2019-09-13 15:40:01.149815", > >>>>>>> "last_beacon": "2019-09-13 15:40:26.151381", > >>>>>>> "status": { > >>>>>>> "json": > >>>>>>> "{\"2\":{\"name\":\"rbd\",\"callouts\":{},\"image_assigned_count\":51,\"image_error_count\":0,\"image_local_count\":51,\"image_remote_count\":51,\"image_warning_count\":0,\"instance_id\":\"84247\",\"leader\":true}}" > >>>>>>> } > >>>>>>> } > >>>>>>> }, > >>>>>>> "rgw": { > >>>>>>> ... > >>>>>>> } > >>>>>>> } > >>>>>>> > >>>>>>>> The "down" state indicates that the rbd-mirror daemon isn't correctly > >>>>>>>> watching the "rbd_mirroring" object in the pool. You can see who it > >>>>>>>> watching that object by running the "rados" "listwatchers" command: > >>>>>>>> > >>>>>>>> $ rados -p <pool name> listwatchers rbd_mirroring > >>>>>>>> watcher=1.2.3.4:0/199388543 client.4154 cookie=94769010788992 > >>>>>>>> watcher=1.2.3.4:0/199388543 client.4154 cookie=94769061031424 > >>>>>>>> > >>>>>>>> In my case, the "4154" from "client.4154" is the unique global id for > >>>>>>>> my connection to the cluster, which relates back to the "ceph service > >>>>>>>> status" dump which also shows status by daemon using the unique > >>>>>>>> global > >>>>>>>> id. > >>>>>>> > >>>>>>> Sadly(?), this looks as expected: > >>>>>>> > >>>>>>> # rados -p rbd listwatchers rbd_mirroring > >>>>>>> watcher=10.160.19.240:0/2922488671 client.84247 cookie=139770046978672 > >>>>>>> watcher=10.160.19.240:0/2922488671 client.84247 cookie=139771389162560 > >>>>>> > >>>>>> Hmm, the unique id is different (84243 vs 84247). I wouldn't have > >>>>>> expected the global id to have changed. Did you restart the Ceph > >>>>>> cluster or MONs? Do you see any "peer assigned me a different > >>>>>> global_id" errors in your rbd-mirror logs? > >>>>>> > >>>>>> I'll open a tracker ticket to fix the "ceph service status", though, > >>>>>> since clearly your global id changed but it wasn't noticed by the > >>>>>> service daemon status updater. > >>>>> > >>>>> ... also, can you please provide the output from the following via a > >>>>> pastebin link? > >>>>> > >>>>> # rados -p rbd listomapvals rbd_mirroring > >>>> > >>>> Of course, here you go: > >>>> https://0x0.st/zy8J.txt > >>> > >>> Thanks. For the case above of global image id > >>> 1a53fafa-37ef-4edf-9633-c2ba3323ed93, the on-disk status shows that it > >>> was last updated by client.84247 / nonce 2922488671, which correctly > >>> matches your watcher so the status should be "up": > >>> > >>> status_global_1a53fafa-37ef-4edf-9633-c2ba3323ed93 > >>> value (232 bytes) : > >>> 00000000 01 01 2c 00 00 00 08 17 49 01 00 00 00 00 00 01 > >>> |..,.....I.......| <--- "17 49 01 00 00 00 00 00" (84247) is the > >>> instance id > >>> 00000010 01 01 1c 00 00 00 03 00 00 00 5f a3 31 ae 10 00 > >>> |.........._.1...| <--- "5f a3 31 ae" is the nonce (2922488671) > >>> 00000020 00 00 02 00 00 00 0a a0 13 f0 00 00 00 00 00 00 > >>> |................| <--- "0a a0 13 f0" is the IP address > >>> (10.160.9.240) > >>> 00000030 00 00 01 01 b0 00 00 00 04 a2 00 00 00 72 65 70 > >>> |.............rep| > >>> 00000040 6c 61 79 69 6e 67 2c 20 6d 61 73 74 65 72 5f 70 |laying, > >>> master_p| > >>> 00000050 6f 73 69 74 69 6f 6e 3d 5b 6f 62 6a 65 63 74 5f > >>> |osition=[object_| > >>> 00000060 6e 75 6d 62 65 72 3d 31 39 2c 20 74 61 67 5f 74 |number=19, > >>> tag_t| > >>> 00000070 69 64 3d 36 2c 20 65 6e 74 72 79 5f 74 69 64 3d |id=6, > >>> entry_tid=| > >>> 00000080 32 36 34 34 33 5d 2c 20 6d 69 72 72 6f 72 5f 70 |26443], > >>> mirror_p| > >>> 00000090 6f 73 69 74 69 6f 6e 3d 5b 6f 62 6a 65 63 74 5f > >>> |osition=[object_| > >>> 000000a0 6e 75 6d 62 65 72 3d 31 39 2c 20 74 61 67 5f 74 |number=19, > >>> tag_t| > >>> 000000b0 69 64 3d 36 2c 20 65 6e 74 72 79 5f 74 69 64 3d |id=6, > >>> entry_tid=| > >>> 000000c0 32 36 34 34 33 5d 2c 20 65 6e 74 72 69 65 73 5f |26443], > >>> entries_| > >>> 000000d0 62 65 68 69 6e 64 5f 6d 61 73 74 65 72 3d 30 51 > >>> |behind_master=0Q| > >>> 000000e0 aa 7b 5d 1b 5f 4f 33 00 |.{]._O3.| > >>> 000000e8 > >>> > >>> The only thing I can think of is that somehow the watcher entity > >>> instance has a different encoding and its failing a comparison. Can > >>> you restart rbd-mirror such that the statuses list "up+replaying" and > >>> then run the following? > >>> > >>> # rados -p rbd getomapval rbd_mirroring > >>> status_global_1a53fafa-37ef-4edf-9633-c2ba3323ed93 > >> > >> Interesting! Again, thanks for the detailed context - learning a bit more > >> about the internals is one of the many reasons why we love Ceph so much, > >> and something which fully proprietary code will usually never deliver :-). > >> > >> Here's the output after the restart, image is in up+replaying state: > >> > >> # rados -p rbd getomapval rbd_mirroring > >> status_global_1a53fafa-37ef-4edf-9633-c2ba3323ed93 > >> value (232 bytes) : > >> 00000000 01 01 2c 00 00 00 08 ec 50 01 00 00 00 00 00 01 > >> |..,.....P.......| > >> 00000010 01 01 1c 00 00 00 03 00 00 00 0b 24 cd a5 10 00 > >> |...........$....| > >> 00000020 00 00 02 00 00 00 0a a0 13 f0 00 00 00 00 00 00 > >> |................| > >> 00000030 00 00 01 01 b0 00 00 00 04 a2 00 00 00 72 65 70 > >> |.............rep| > >> 00000040 6c 61 79 69 6e 67 2c 20 6d 61 73 74 65 72 5f 70 |laying, > >> master_p| > >> 00000050 6f 73 69 74 69 6f 6e 3d 5b 6f 62 6a 65 63 74 5f > >> |osition=[object_| > >> 00000060 6e 75 6d 62 65 72 3d 31 38 2c 20 74 61 67 5f 74 |number=18, > >> tag_t| > >> 00000070 69 64 3d 36 2c 20 65 6e 74 72 79 5f 74 69 64 3d |id=6, > >> entry_tid=| > >> 00000080 32 37 36 32 36 5d 2c 20 6d 69 72 72 6f 72 5f 70 |27626], > >> mirror_p| > >> 00000090 6f 73 69 74 69 6f 6e 3d 5b 6f 62 6a 65 63 74 5f > >> |osition=[object_| > >> 000000a0 6e 75 6d 62 65 72 3d 31 38 2c 20 74 61 67 5f 74 |number=18, > >> tag_t| > >> 000000b0 69 64 3d 36 2c 20 65 6e 74 72 79 5f 74 69 64 3d |id=6, > >> entry_tid=| > >> 000000c0 32 37 36 32 36 5d 2c 20 65 6e 74 72 69 65 73 5f |27626], > >> entries_| > >> 000000d0 62 65 68 69 6e 64 5f 6d 61 73 74 65 72 3d 30 eb > >> |behind_master=0.| > >> 000000e0 b3 7b 5d 27 9c d8 31 00 |.{]'..1.| > >> 000000e8 > >> > >> IIUC, this decodes to instance ID 86252, IP address of course stayed the > >> same. > >> > >> Checking the other output: > >> > >> # ceph service status > >> { > >> "rbd-mirror": { > >> "86248": { > >> "status_stamp": "2019-09-13 17:26:15.391048", > >> "last_beacon": "2019-09-13 17:26:25.391759", > >> "status": { > >> "json": > >> "{\"2\":{\"name\":\"rbd\",\"callouts\":{},\"image_assigned_count\":51,\"image_error_count\":0,\"image_local_count\":51,\"image_remote_count\":51,\"image_warning_count\":0,\"instance_id\":\"86252\",\"leader\":true}}" > >> } > >> } > >> }, > >> ... > >> } > >> > >> # rados -p rbd listwatchers rbd_mirroring > >> watcher=10.160.19.240:0/2781684747 client.86252 cookie=140089552292144 > >> watcher=10.160.19.240:0/2781684747 client.86252 cookie=140090961572928 > >> > >> This looks as strange as before again: Global instance ID is 86248, but > >> instance ID (and what I find in the omap dump) is 86252. > >> > >> However, things look okay in the dashboard again and also: > >> # rbd mirror pool status > >> health: OK > >> images: 51 total > >> 51 replaying > >> > >> Cheers, > >> Oliver > > > > Can you also provide the output from "ceph features"? > > Here you go: > ------------------------------------------------------ > # ceph features > { > "mon": [ > { > "features": "0x3ffddff8ffacffff", > "release": "luminous", > "num": 1 > } > ], > "osd": [ > { > "features": "0x3ffddff8ffacffff", > "release": "luminous", > "num": 6 > } > ], > "client": [ > { > "features": "0x3ffddff8ffacffff", > "release": "luminous", > "num": 6 > } > ], > "mgr": [ > { > "features": "0x3ffddff8ffacffff", > "release": "luminous", > "num": 1 > } > ] > } > ------------------------------------------------------ > This is a rather fresh Nautilus cluster, which has not yet seen any version > upgrade in its lifetime. > > Cheers, > Oliver > > > > > >> > >>> > >>> > >>> > >>> > >>> > >>>> Cheers, > >>>> Oliver > >>>> > >>>>> > >>>>>>> However, the dashboard still shows those images in "unknown", and > >>>>>>> this also shows up via command line: > >>>>>>> > >>>>>>> # rbd mirror pool status > >>>>>>> health: WARNING > >>>>>>> images: 51 total > >>>>>>> 51 unknown > >>>>>>> # rbd mirror image status test-vm.physik.uni-bonn.de-disk1 > >>>>>>> test-vm.physik.uni-bonn.de-disk2: > >>>>>>> global_id: 1a53fafa-37ef-4edf-9633-c2ba3323ed93 > >>>>>>> state: down+replaying > >>>>>>> description: replaying, master_position=[object_number=18, > >>>>>>> tag_tid=6, entry_tid=25202], mirror_position=[object_number=18, > >>>>>>> tag_tid=6, entry_tid=25202], entries_behind_master=0 > >>>>>>> last_update: 2019-09-13 15:55:15 > >>>>>>> > >>>>>>> Any ideas on what else could cause this? > >>>>>>> > >>>>>>> Cheers and thanks, > >>>>>>> Oliver > >>>>>>> > >>>>>>>> > >>>>>>>>> Any idea on this (or how I can extract more information)? > >>>>>>>>> I fear keeping high-level debug logs active for ~24h is not > >>>>>>>>> feasible. > >>>>>>>>> > >>>>>>>>> Cheers, > >>>>>>>>> Oliver > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On 2019-09-11 19:14, Jason Dillaman wrote: > >>>>>>>>>> On Wed, Sep 11, 2019 at 12:57 PM Oliver Freyermuth > >>>>>>>>>> <freyerm...@physik.uni-bonn.de> wrote: > >>>>>>>>>>> > >>>>>>>>>>> Dear Jason, > >>>>>>>>>>> > >>>>>>>>>>> I played a bit more with rbd mirroring and learned that deleting > >>>>>>>>>>> an image at the source (or disabling journaling on it) > >>>>>>>>>>> immediately moves the image to trash at the target - > >>>>>>>>>>> but setting rbd_mirroring_delete_delay helps to have some more > >>>>>>>>>>> grace time to catch human mistakes. > >>>>>>>>>>> > >>>>>>>>>>> However, I have issues restoring such an image which has been > >>>>>>>>>>> moved to trash by the RBD-mirror daemon as user: > >>>>>>>>>>> ----------------------------------- > >>>>>>>>>>> [root@mon001 ~]# rbd trash ls -la > >>>>>>>>>>> ID NAME SOURCE > >>>>>>>>>>> DELETED_AT STATUS > >>>>>>>>>>> PARENT > >>>>>>>>>>> d4fbe8f63905 test-vm-XXXXXXXXXXXXXXXXXX-disk2 MIRRORING Wed Sep > >>>>>>>>>>> 11 18:43:14 2019 protected until Thu Sep 12 18:43:14 2019 > >>>>>>>>>>> [root@mon001 ~]# rbd trash restore --image foo-image d4fbe8f63905 > >>>>>>>>>>> rbd: restore error: 2019-09-11 18:50:15.387 7f5fa9590b00 -1 > >>>>>>>>>>> librbd::api::Trash: restore: Current trash source: mirroring does > >>>>>>>>>>> not match expected: user > >>>>>>>>>>> (22) Invalid argument > >>>>>>>>>>> ----------------------------------- > >>>>>>>>>>> This is issued on the mon, which has the client.admin key, so it > >>>>>>>>>>> should not be a permission issue. > >>>>>>>>>>> It also fails when I try that in the Dashboard. > >>>>>>>>>>> > >>>>>>>>>>> Sadly, the error message is not clear enough for me to figure out > >>>>>>>>>>> what could be the problem - do you see what I did wrong? > >>>>>>>>>> > >>>>>>>>>> Good catch, it looks like we accidentally broke this in Nautilus > >>>>>>>>>> when > >>>>>>>>>> image live-migration support was added. I've opened a new tracker > >>>>>>>>>> ticket to fix this [1]. > >>>>>>>>>> > >>>>>>>>>>> Cheers and thanks again, > >>>>>>>>>>> Oliver > >>>>>>>>>>> > >>>>>>>>>>> On 2019-09-10 23:17, Oliver Freyermuth wrote: > >>>>>>>>>>>> Dear Jason, > >>>>>>>>>>>> > >>>>>>>>>>>> On 2019-09-10 23:04, Jason Dillaman wrote: > >>>>>>>>>>>>> On Tue, Sep 10, 2019 at 2:08 PM Oliver Freyermuth > >>>>>>>>>>>>> <freyerm...@physik.uni-bonn.de> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Dear Jason, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On 2019-09-10 18:50, Jason Dillaman wrote: > >>>>>>>>>>>>>>> On Tue, Sep 10, 2019 at 12:25 PM Oliver Freyermuth > >>>>>>>>>>>>>>> <freyerm...@physik.uni-bonn.de> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Dear Cephalopodians, > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I have two questions about RBD mirroring. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> 1) I can not get it to work - my setup is: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> - One cluster holding the live RBD volumes and > >>>>>>>>>>>>>>>> snapshots, in pool "rbd", cluster name "ceph", > >>>>>>>>>>>>>>>> running latest Mimic. > >>>>>>>>>>>>>>>> I ran "rbd mirror pool enable rbd pool" on that > >>>>>>>>>>>>>>>> cluster and created a cephx user "rbd_mirror" with (is there > >>>>>>>>>>>>>>>> a better way?): > >>>>>>>>>>>>>>>> ceph auth get-or-create client.rbd_mirror mon > >>>>>>>>>>>>>>>> 'allow r' osd 'allow class-read object_prefix rbd_children, > >>>>>>>>>>>>>>>> allow pool rbd r' -o ceph.client.rbd_mirror.keyring > >>>>>>>>>>>>>>>> --cluster ceph > >>>>>>>>>>>>>>>> In that pool, two images have the journaling > >>>>>>>>>>>>>>>> feature activated, all others have it disabled still (so I > >>>>>>>>>>>>>>>> would expect these two to be mirrored). > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> You can just use "mon 'profile rbd' osd 'profile rbd'" for > >>>>>>>>>>>>>>> the caps -- > >>>>>>>>>>>>>>> but you definitely need more than read-only permissions to > >>>>>>>>>>>>>>> the remote > >>>>>>>>>>>>>>> cluster since it needs to be able to create snapshots of > >>>>>>>>>>>>>>> remote images > >>>>>>>>>>>>>>> and update/trim the image journals. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> these profiles really make life a lot easier. I should have > >>>>>>>>>>>>>> thought of them rather than "guessing" a potentially good > >>>>>>>>>>>>>> configuration... > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> - Another (empty) cluster running latest Nautilus, > >>>>>>>>>>>>>>>> cluster name "ceph", pool "rbd". > >>>>>>>>>>>>>>>> I've used the dashboard to activate mirroring for > >>>>>>>>>>>>>>>> the RBD pool, and then added a peer with cluster name > >>>>>>>>>>>>>>>> "ceph-virt", cephx-ID "rbd_mirror", filled in the mons and > >>>>>>>>>>>>>>>> key created above. > >>>>>>>>>>>>>>>> I've then run: > >>>>>>>>>>>>>>>> ceph auth get-or-create client.rbd_mirror_backup > >>>>>>>>>>>>>>>> mon 'allow r' osd 'allow class-read object_prefix > >>>>>>>>>>>>>>>> rbd_children, allow pool rbd rwx' -o > >>>>>>>>>>>>>>>> client.rbd_mirror_backup.keyring --cluster ceph > >>>>>>>>>>>>>>>> and deployed that key on the rbd-mirror machine, > >>>>>>>>>>>>>>>> and started the service with: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Please use "mon 'profile rbd-mirror' osd 'profile rbd'" for > >>>>>>>>>>>>>>> your caps [1]. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> That did the trick (in combination with the above)! > >>>>>>>>>>>>>> Again a case of PEBKAC: I should have read the documentation > >>>>>>>>>>>>>> until the end, clearly my fault. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> It works well now, even though it seems to run a bit slow (~35 > >>>>>>>>>>>>>> MB/s for the initial sync when everything is 1 GBit/s), > >>>>>>>>>>>>>> but that may also be caused by combination of some very > >>>>>>>>>>>>>> limited hardware on the receiving end (which will be scaled up > >>>>>>>>>>>>>> in the future). > >>>>>>>>>>>>>> A single host with 6 disks, replica 3 and a RAID controller > >>>>>>>>>>>>>> which can only do RAID0 and not JBOD is certainly not ideal, > >>>>>>>>>>>>>> so commit latency may cause this slow bandwidth. > >>>>>>>>>>>>> > >>>>>>>>>>>>> You could try increasing "rbd_concurrent_management_ops" from > >>>>>>>>>>>>> the > >>>>>>>>>>>>> default of 10 ops to something higher to attempt to account for > >>>>>>>>>>>>> the > >>>>>>>>>>>>> latency. However, I wouldn't expect near-line speed w/ RBD > >>>>>>>>>>>>> mirroring. > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks - I will play with this option once we have more storage > >>>>>>>>>>>> available in the target pool ;-). > >>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> systemctl start > >>>>>>>>>>>>>>>> ceph-rbd-mirror@rbd_mirror_backup.service > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> After this, everything looks fine: > >>>>>>>>>>>>>>>> # rbd mirror pool info > >>>>>>>>>>>>>>>> Mode: pool > >>>>>>>>>>>>>>>> Peers: > >>>>>>>>>>>>>>>> UUID NAME > >>>>>>>>>>>>>>>> CLIENT > >>>>>>>>>>>>>>>> XXXXXXXXXXX ceph-virt > >>>>>>>>>>>>>>>> client.rbd_mirror > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> The service also seems to start fine, but logs show > >>>>>>>>>>>>>>>> (debug rbd_mirror=20): > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> rbd::mirror::ClusterWatcher:0x5575e2a7d390 > >>>>>>>>>>>>>>>> resolve_peer_config_keys: retrieving config-key: pool_id=2, > >>>>>>>>>>>>>>>> pool_name=rbd, peer_uuid=XXXXXXXXXXX > >>>>>>>>>>>>>>>> rbd::mirror::Mirror: 0x5575e29c7240 > >>>>>>>>>>>>>>>> update_pool_replayers: enter > >>>>>>>>>>>>>>>> rbd::mirror::Mirror: 0x5575e29c7240 > >>>>>>>>>>>>>>>> update_pool_replayers: restarting failed pool replayer for > >>>>>>>>>>>>>>>> uuid: XXXXXXXXXXX cluster: ceph-virt client: > >>>>>>>>>>>>>>>> client.rbd_mirror > >>>>>>>>>>>>>>>> rbd::mirror::PoolReplayer: 0x5575e2a7da20 init: > >>>>>>>>>>>>>>>> replaying for uuid: XXXXXXXXXXX cluster: ceph-virt client: > >>>>>>>>>>>>>>>> client.rbd_mirror > >>>>>>>>>>>>>>>> rbd::mirror::PoolReplayer: 0x5575e2a7da20 > >>>>>>>>>>>>>>>> init_rados: error connecting to remote peer uuid: > >>>>>>>>>>>>>>>> XXXXXXXXXXX cluster: ceph-virt client: client.rbd_mirror: > >>>>>>>>>>>>>>>> (95) Operation not supported > >>>>>>>>>>>>>>>> rbd::mirror::ServiceDaemon: 0x5575e29c8d70 > >>>>>>>>>>>>>>>> add_or_update_callout: pool_id=2, callout_id=2, > >>>>>>>>>>>>>>>> callout_level=error, text=unable to connect to remote cluster > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> If it's still broken after fixing your caps above, perhaps > >>>>>>>>>>>>>>> increase > >>>>>>>>>>>>>>> debugging for "rados", "monc", "auth", and "ms" to see if you > >>>>>>>>>>>>>>> can > >>>>>>>>>>>>>>> determine the source of the op not supported error. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I already tried storing the ceph.client.rbd_mirror.keyring > >>>>>>>>>>>>>>>> (i.e. from the cluster with the live images) on the > >>>>>>>>>>>>>>>> rbd-mirror machine explicitly (i.e. not only in mon config > >>>>>>>>>>>>>>>> storage), > >>>>>>>>>>>>>>>> and after doing that: > >>>>>>>>>>>>>>>> rbd -m mon_ip_of_ceph_virt_cluster --id=rbd_mirror ls > >>>>>>>>>>>>>>>> works fine. So it's not a connectivity issue. Maybe a > >>>>>>>>>>>>>>>> permission issue? Or did I miss something? > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Any idea what "operation not supported" means? > >>>>>>>>>>>>>>>> It's unclear to me whether things should work well using > >>>>>>>>>>>>>>>> Mimic with Nautilus, and enabling pool mirroring but only > >>>>>>>>>>>>>>>> having journaling on for two images is a supported case. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Yes and yes. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> 2) Since there is a performance drawback (about 2x) for > >>>>>>>>>>>>>>>> journaling, is it also possible to only mirror snapshots, > >>>>>>>>>>>>>>>> and leave the live volumes alone? > >>>>>>>>>>>>>>>> This would cover the common backup usecase before > >>>>>>>>>>>>>>>> deferred mirroring is implemented (or is it there already?). > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> This is in-development right now and will hopefully land for > >>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>> Octopus release. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> That would be very cool. Just to clarify: You mean the "real" > >>>>>>>>>>>>>> deferred mirroring, not a "snapshot only" mirroring? > >>>>>>>>>>>>>> Is it already clear if this will require Octopous (or a later > >>>>>>>>>>>>>> release) on both ends, or only on the receiving side? > >>>>>>>>>>>>> > >>>>>>>>>>>>> I might not be sure what you mean by deferred mirroring. You > >>>>>>>>>>>>> can delay > >>>>>>>>>>>>> the replay of the journal via the "rbd_mirroring_replay_delay" > >>>>>>>>>>>>> configuration option so that your DR site can be X seconds > >>>>>>>>>>>>> behind the > >>>>>>>>>>>>> primary at a minimum. > >>>>>>>>>>>> > >>>>>>>>>>>> This is indeed what I was thinking of... > >>>>>>>>>>>> > >>>>>>>>>>>>> For Octopus we are working on on-demand and > >>>>>>>>>>>>> scheduled snapshot mirroring between sites -- no journal is > >>>>>>>>>>>>> involved. > >>>>>>>>>>>> > >>>>>>>>>>>> ... and this is what I was dreaming of. We keep snapshots of VMs > >>>>>>>>>>>> to be able to roll them back. > >>>>>>>>>>>> We'd like to also keep those snapshots in a separate Ceph > >>>>>>>>>>>> instance as an additional safety-net (in addition to an offline > >>>>>>>>>>>> backup of those snapshots with Benji backup). > >>>>>>>>>>>> It is not (yet) clear to me whether we can pay the "2 x" price > >>>>>>>>>>>> for journaling in the long run, so this would be the way to go > >>>>>>>>>>>> in case we can't. > >>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Since I got you personally, I have two bonus questions. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 1) Your talk: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> https://events.static.linuxfound.org/sites/events/files/slides/Disaster%20Recovery%20and%20Ceph%20Block%20Storage-%20Introducing%20Multi-Site%20Mirroring.pdf > >>>>>>>>>>>>>> mentions "rbd journal object flush age", which I'd > >>>>>>>>>>>>>> translate with something like the "commit" mount option on a > >>>>>>>>>>>>>> classical file system - correct? > >>>>>>>>>>>>>> I don't find this switch documented anywhere, though - > >>>>>>>>>>>>>> is there experience with it / what's the default? > >>>>>>>>>>>>> > >>>>>>>>>>>>> It's a low-level knob that by default causes the journal to > >>>>>>>>>>>>> flush its > >>>>>>>>>>>>> pending IO events before it allows the corresponding IO to be > >>>>>>>>>>>>> issued > >>>>>>>>>>>>> against the backing image. Setting it to a value greater that > >>>>>>>>>>>>> zero > >>>>>>>>>>>>> will allow that many seconds of IO events to be batched > >>>>>>>>>>>>> together in a > >>>>>>>>>>>>> journal append operation and its helpful for high-throughout, > >>>>>>>>>>>>> small IO > >>>>>>>>>>>>> operations. Of course it turned out that a bug had broken that > >>>>>>>>>>>>> option > >>>>>>>>>>>>> a while where events would never batch, so a fix is currently > >>>>>>>>>>>>> scheduled for backport of all active releases [1] w/ the goal > >>>>>>>>>>>>> that no > >>>>>>>>>>>>> one should need to tweak it. > >>>>>>>>>>>> > >>>>>>>>>>>> That's even better - since our setup is growing and we will keep > >>>>>>>>>>>> upgrading, I'll then just keep things as they are now (no manual > >>>>>>>>>>>> tweaking) > >>>>>>>>>>>> and tag along the development. Thanks! > >>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>>> 2) I read I can run more than one rbd-mirror with > >>>>>>>>>>>>>> Mimic/Nautilus. Do they load-balance the images, or "only" > >>>>>>>>>>>>>> failover in case one of them dies? > >>>>>>>>>>>>> > >>>>>>>>>>>>> Starting with Nautilus, the default configuration for > >>>>>>>>>>>>> rbd-mirror is to > >>>>>>>>>>>>> evenly divide the number of mirrored images between all running > >>>>>>>>>>>>> daemons. This does not split the total load since some images > >>>>>>>>>>>>> might be > >>>>>>>>>>>>> hotter than others, but it at least spreads the load. > >>>>>>>>>>>> > >>>>>>>>>>>> That's fine enough for our use case. Spreading by "hotness" is a > >>>>>>>>>>>> task without a clear answer > >>>>>>>>>>>> and "temperature" may change quickly, so that's all I hoped for. > >>>>>>>>>>>> > >>>>>>>>>>>> Many thanks again for the very helpful explanations! > >>>>>>>>>>>> > >>>>>>>>>>>> Cheers, > >>>>>>>>>>>> Oliver > >>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Cheers and many thanks for the quick and perfect help! > >>>>>>>>>>>>>> Oliver > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Cheers and thanks in advance, > >>>>>>>>>>>>>>>> Oliver > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> _______________________________________________ > >>>>>>>>>>>>>>>> ceph-users mailing list > >>>>>>>>>>>>>>>> ceph-users@lists.ceph.com > >>>>>>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> [1] > >>>>>>>>>>>>>>> https://docs.ceph.com/docs/master/rbd/rbd-mirroring/#rbd-mirror-daemon > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>> Jason > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> [1] https://github.com/ceph/ceph/pull/28539 > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> [1] https://tracker.ceph.com/issues/41780 > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> Jason > >>>>> > >>>>> > >>>>> > >>>> > >>>> > >>> > >>> > >>> -- > >>> Jason > >>> > >> > >> > > > > > [1] https://tracker.ceph.com/issues/41833 -- Jason _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com