Re: [ceph-users] Ceph RBD Mirroring

Jason Dillaman Sat, 14 Sep 2019 15:05:25 -0700

I was able to repeat this issue locally by restarting the primary OSD
for the "rbd_mirroring" object. It seems that a regression was
introduced w/ the introduction of Ceph msgr2 in that upon reconnect,
the connection type for the client switches from ANY to V2 -- but only
for the watcher session and not the status updates. I've opened a
tracker ticker for this issue [1].


Thanks.

On Fri, Sep 13, 2019 at 12:44 PM Oliver Freyermuth
<freyerm...@physik.uni-bonn.de> wrote:
>
> Am 13.09.19 um 18:38 schrieb Jason Dillaman:
> > On Fri, Sep 13, 2019 at 11:30 AM Oliver Freyermuth
> > <freyerm...@physik.uni-bonn.de> wrote:
> >>
> >> Am 13.09.19 um 17:18 schrieb Jason Dillaman:
> >>> On Fri, Sep 13, 2019 at 10:41 AM Oliver Freyermuth
> >>> <freyerm...@physik.uni-bonn.de> wrote:
> >>>>
> >>>> Am 13.09.19 um 16:30 schrieb Jason Dillaman:
> >>>>> On Fri, Sep 13, 2019 at 10:17 AM Jason Dillaman <jdill...@redhat.com> 
> >>>>> wrote:
> >>>>>>
> >>>>>> On Fri, Sep 13, 2019 at 10:02 AM Oliver Freyermuth
> >>>>>> <freyerm...@physik.uni-bonn.de> wrote:
> >>>>>>>
> >>>>>>> Dear Jason,
> >>>>>>>
> >>>>>>> thanks for the very detailed explanation! This was very instructive.
> >>>>>>> Sadly, the watchers look correct - see details inline.
> >>>>>>>
> >>>>>>> Am 13.09.19 um 15:02 schrieb Jason Dillaman:
> >>>>>>>> On Thu, Sep 12, 2019 at 9:55 PM Oliver Freyermuth
> >>>>>>>> <freyerm...@physik.uni-bonn.de> wrote:
> >>>>>>>>>
> >>>>>>>>> Dear Jason,
> >>>>>>>>>
> >>>>>>>>> thanks for taking care and developing a patch so quickly!
> >>>>>>>>>
> >>>>>>>>> I have another strange observation to share. In our test setup, 
> >>>>>>>>> only a single RBD mirroring daemon is running for 51 images.
> >>>>>>>>> It works fine with a constant stream of 1-2 MB/s, but at some point 
> >>>>>>>>> after roughly 20 hours, _all_ images go to this interesting state:
> >>>>>>>>> -----------------------------------------
> >>>>>>>>> # rbd mirror image status test-vm.XXXXX-disk2
> >>>>>>>>> test-vm.XXXXX-disk2:
> >>>>>>>>>       global_id:   XXXXXXXXXXXXXXX
> >>>>>>>>>       state:       down+replaying
> >>>>>>>>>       description: replaying, master_position=[object_number=14, 
> >>>>>>>>> tag_tid=6, entry_tid=6338], mirror_position=[object_number=14, 
> >>>>>>>>> tag_tid=6, entry_tid=6338], entries_behind_master=0
> >>>>>>>>>       last_update: 2019-09-13 03:45:43
> >>>>>>>>> -----------------------------------------
> >>>>>>>>> Running this command several times, I see entry_tid increasing at 
> >>>>>>>>> both ends, so mirroring seems to be working just fine.
> >>>>>>>>>
> >>>>>>>>> However:
> >>>>>>>>> -----------------------------------------
> >>>>>>>>> # rbd mirror pool status
> >>>>>>>>> health: WARNING
> >>>>>>>>> images: 51 total
> >>>>>>>>>         51 unknown
> >>>>>>>>> -----------------------------------------
> >>>>>>>>> The health warning is not visible in the dashboard (also not in the 
> >>>>>>>>> mirroring menu), the daemon still seems to be running, dropped 
> >>>>>>>>> nothing in the logs,
> >>>>>>>>> and claims to be "ok" in the dashboard - it's only that all images 
> >>>>>>>>> show up in unknown state even though all seems to be working fine.
> >>>>>>>>>
> >>>>>>>>> Any idea on how to debug this?
> >>>>>>>>> When I restart the rbd-mirror service, all images come back as 
> >>>>>>>>> green. I already encountered this twice in 3 days.
> >>>>>>>>
> >>>>>>>> The dashboard relies on the rbd-mirror daemon to provide it errors 
> >>>>>>>> and
> >>>>>>>> warnings. You can see the status reported by rbd-mirror by running
> >>>>>>>> "ceph service status":
> >>>>>>>>
> >>>>>>>> $ ceph service status
> >>>>>>>> {
> >>>>>>>>         "rbd-mirror": {
> >>>>>>>>             "4152": {
> >>>>>>>>                 "status_stamp": "2019-09-13T08:58:41.937491-0400",
> >>>>>>>>                 "last_beacon": "2019-09-13T08:58:41.937491-0400",
> >>>>>>>>                 "status": {
> >>>>>>>>                     "json":
> >>>>>>>> "{\"1\":{\"name\":\"mirror\",\"callouts\":{},\"image_assigned_count\":1,\"image_error_count\":0,\"image_local_count\":1,\"image_remote_count\":1,\"image_warning_count\":0,\"instance_id\":\"4154\",\"leader\":true},\"2\":{\"name\":\"mirror_parent\",\"callouts\":{},\"image_assigned_count\":0,\"image_error_count\":0,\"image_local_count\":0,\"image_remote_count\":0,\"image_warning_count\":0,\"instance_id\":\"4156\",\"leader\":true}}"
> >>>>>>>>                 }
> >>>>>>>>             }
> >>>>>>>>         }
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> In your case, most likely it seems like rbd-mirror thinks all is good
> >>>>>>>> with the world so it's not reporting any errors.
> >>>>>>>
> >>>>>>> This is indeed the case:
> >>>>>>>
> >>>>>>> # ceph service status
> >>>>>>> {
> >>>>>>>         "rbd-mirror": {
> >>>>>>>             "84243": {
> >>>>>>>                 "status_stamp": "2019-09-13 15:40:01.149815",
> >>>>>>>                 "last_beacon": "2019-09-13 15:40:26.151381",
> >>>>>>>                 "status": {
> >>>>>>>                     "json": 
> >>>>>>> "{\"2\":{\"name\":\"rbd\",\"callouts\":{},\"image_assigned_count\":51,\"image_error_count\":0,\"image_local_count\":51,\"image_remote_count\":51,\"image_warning_count\":0,\"instance_id\":\"84247\",\"leader\":true}}"
> >>>>>>>                 }
> >>>>>>>             }
> >>>>>>>         },
> >>>>>>>         "rgw": {
> >>>>>>> ...
> >>>>>>>         }
> >>>>>>> }
> >>>>>>>
> >>>>>>>> The "down" state indicates that the rbd-mirror daemon isn't correctly
> >>>>>>>> watching the "rbd_mirroring" object in the pool. You can see who it
> >>>>>>>> watching that object by running the "rados" "listwatchers" command:
> >>>>>>>>
> >>>>>>>> $ rados -p <pool name> listwatchers rbd_mirroring
> >>>>>>>> watcher=1.2.3.4:0/199388543 client.4154 cookie=94769010788992
> >>>>>>>> watcher=1.2.3.4:0/199388543 client.4154 cookie=94769061031424
> >>>>>>>>
> >>>>>>>> In my case, the "4154" from "client.4154" is the unique global id for
> >>>>>>>> my connection to the cluster, which relates back to the "ceph service
> >>>>>>>> status" dump which also shows status by daemon using the unique 
> >>>>>>>> global
> >>>>>>>> id.
> >>>>>>>
> >>>>>>> Sadly(?), this looks as expected:
> >>>>>>>
> >>>>>>> # rados -p rbd listwatchers rbd_mirroring
> >>>>>>> watcher=10.160.19.240:0/2922488671 client.84247 cookie=139770046978672
> >>>>>>> watcher=10.160.19.240:0/2922488671 client.84247 cookie=139771389162560
> >>>>>>
> >>>>>> Hmm, the unique id is different (84243 vs 84247). I wouldn't have
> >>>>>> expected the global id to have changed. Did you restart the Ceph
> >>>>>> cluster or MONs? Do you see any "peer assigned me a different
> >>>>>> global_id" errors in your rbd-mirror logs?
> >>>>>>
> >>>>>> I'll open a tracker ticket to fix the "ceph service status", though,
> >>>>>> since clearly your global id changed but it wasn't noticed by the
> >>>>>> service daemon status updater.
> >>>>>
> >>>>> ... also, can you please provide the output from the following via a
> >>>>> pastebin link?
> >>>>>
> >>>>> # rados -p rbd listomapvals rbd_mirroring
> >>>>
> >>>> Of course, here you go:
> >>>> https://0x0.st/zy8J.txt
> >>>
> >>> Thanks. For the case above of global image id
> >>> 1a53fafa-37ef-4edf-9633-c2ba3323ed93, the on-disk status shows that it
> >>> was last updated by client.84247 / nonce 2922488671, which correctly
> >>> matches your watcher so the status should be "up":
> >>>
> >>> status_global_1a53fafa-37ef-4edf-9633-c2ba3323ed93
> >>> value (232 bytes) :
> >>> 00000000  01 01 2c 00 00 00 08 17  49 01 00 00 00 00 00 01
> >>> |..,.....I.......|     <--- "17  49 01 00 00 00 00 00" (84247) is the
> >>> instance id
> >>> 00000010  01 01 1c 00 00 00 03 00  00 00 5f a3 31 ae 10 00
> >>> |.........._.1...|    <--- "5f a3 31 ae" is the nonce (2922488671)
> >>> 00000020  00 00 02 00 00 00 0a a0  13 f0 00 00 00 00 00 00
> >>> |................|     <--- "0a a0  13 f0" is the IP address
> >>> (10.160.9.240)
> >>> 00000030  00 00 01 01 b0 00 00 00  04 a2 00 00 00 72 65 70  
> >>> |.............rep|
> >>> 00000040  6c 61 79 69 6e 67 2c 20  6d 61 73 74 65 72 5f 70  |laying, 
> >>> master_p|
> >>> 00000050  6f 73 69 74 69 6f 6e 3d  5b 6f 62 6a 65 63 74 5f  
> >>> |osition=[object_|
> >>> 00000060  6e 75 6d 62 65 72 3d 31  39 2c 20 74 61 67 5f 74  |number=19, 
> >>> tag_t|
> >>> 00000070  69 64 3d 36 2c 20 65 6e  74 72 79 5f 74 69 64 3d  |id=6, 
> >>> entry_tid=|
> >>> 00000080  32 36 34 34 33 5d 2c 20  6d 69 72 72 6f 72 5f 70  |26443], 
> >>> mirror_p|
> >>> 00000090  6f 73 69 74 69 6f 6e 3d  5b 6f 62 6a 65 63 74 5f  
> >>> |osition=[object_|
> >>> 000000a0  6e 75 6d 62 65 72 3d 31  39 2c 20 74 61 67 5f 74  |number=19, 
> >>> tag_t|
> >>> 000000b0  69 64 3d 36 2c 20 65 6e  74 72 79 5f 74 69 64 3d  |id=6, 
> >>> entry_tid=|
> >>> 000000c0  32 36 34 34 33 5d 2c 20  65 6e 74 72 69 65 73 5f  |26443], 
> >>> entries_|
> >>> 000000d0  62 65 68 69 6e 64 5f 6d  61 73 74 65 72 3d 30 51  
> >>> |behind_master=0Q|
> >>> 000000e0  aa 7b 5d 1b 5f 4f 33 00                           |.{]._O3.|
> >>> 000000e8
> >>>
> >>> The only thing I can think of is that somehow the watcher entity
> >>> instance has a different encoding and its failing a comparison. Can
> >>> you restart rbd-mirror such that the statuses list "up+replaying" and
> >>> then run the following?
> >>>
> >>> # rados -p rbd getomapval rbd_mirroring
> >>> status_global_1a53fafa-37ef-4edf-9633-c2ba3323ed93
> >>
> >> Interesting! Again, thanks for the detailed context - learning a bit more 
> >> about the internals is one of the many reasons why we love Ceph so much,
> >> and something which fully proprietary code will usually never deliver :-).
> >>
> >> Here's the output after the restart, image is in up+replaying state:
> >>
> >> # rados -p rbd getomapval rbd_mirroring 
> >> status_global_1a53fafa-37ef-4edf-9633-c2ba3323ed93
> >> value (232 bytes) :
> >> 00000000  01 01 2c 00 00 00 08 ec  50 01 00 00 00 00 00 01  
> >> |..,.....P.......|
> >> 00000010  01 01 1c 00 00 00 03 00  00 00 0b 24 cd a5 10 00  
> >> |...........$....|
> >> 00000020  00 00 02 00 00 00 0a a0  13 f0 00 00 00 00 00 00  
> >> |................|
> >> 00000030  00 00 01 01 b0 00 00 00  04 a2 00 00 00 72 65 70  
> >> |.............rep|
> >> 00000040  6c 61 79 69 6e 67 2c 20  6d 61 73 74 65 72 5f 70  |laying, 
> >> master_p|
> >> 00000050  6f 73 69 74 69 6f 6e 3d  5b 6f 62 6a 65 63 74 5f  
> >> |osition=[object_|
> >> 00000060  6e 75 6d 62 65 72 3d 31  38 2c 20 74 61 67 5f 74  |number=18, 
> >> tag_t|
> >> 00000070  69 64 3d 36 2c 20 65 6e  74 72 79 5f 74 69 64 3d  |id=6, 
> >> entry_tid=|
> >> 00000080  32 37 36 32 36 5d 2c 20  6d 69 72 72 6f 72 5f 70  |27626], 
> >> mirror_p|
> >> 00000090  6f 73 69 74 69 6f 6e 3d  5b 6f 62 6a 65 63 74 5f  
> >> |osition=[object_|
> >> 000000a0  6e 75 6d 62 65 72 3d 31  38 2c 20 74 61 67 5f 74  |number=18, 
> >> tag_t|
> >> 000000b0  69 64 3d 36 2c 20 65 6e  74 72 79 5f 74 69 64 3d  |id=6, 
> >> entry_tid=|
> >> 000000c0  32 37 36 32 36 5d 2c 20  65 6e 74 72 69 65 73 5f  |27626], 
> >> entries_|
> >> 000000d0  62 65 68 69 6e 64 5f 6d  61 73 74 65 72 3d 30 eb  
> >> |behind_master=0.|
> >> 000000e0  b3 7b 5d 27 9c d8 31 00                           |.{]'..1.|
> >> 000000e8
> >>
> >> IIUC, this decodes to instance ID 86252, IP address of course stayed the 
> >> same.
> >>
> >> Checking the other output:
> >>
> >> # ceph service status
> >> {
> >>       "rbd-mirror": {
> >>           "86248": {
> >>               "status_stamp": "2019-09-13 17:26:15.391048",
> >>               "last_beacon": "2019-09-13 17:26:25.391759",
> >>               "status": {
> >>                   "json": 
> >> "{\"2\":{\"name\":\"rbd\",\"callouts\":{},\"image_assigned_count\":51,\"image_error_count\":0,\"image_local_count\":51,\"image_remote_count\":51,\"image_warning_count\":0,\"instance_id\":\"86252\",\"leader\":true}}"
> >>               }
> >>           }
> >>       },
> >> ...
> >> }
> >>
> >> # rados -p rbd listwatchers rbd_mirroring
> >> watcher=10.160.19.240:0/2781684747 client.86252 cookie=140089552292144
> >> watcher=10.160.19.240:0/2781684747 client.86252 cookie=140090961572928
> >>
> >> This looks as strange as before again: Global instance ID is 86248, but 
> >> instance ID (and what I find in the omap dump) is 86252.
> >>
> >> However, things look okay in the dashboard again and also:
> >> # rbd mirror pool status
> >> health: OK
> >> images: 51 total
> >>       51 replaying
> >>
> >> Cheers,
> >>          Oliver
> >
> > Can you also provide the output from "ceph features"?
>
> Here you go:
> ------------------------------------------------------
> # ceph features
> {
>      "mon": [
>          {
>              "features": "0x3ffddff8ffacffff",
>              "release": "luminous",
>              "num": 1
>          }
>      ],
>      "osd": [
>          {
>              "features": "0x3ffddff8ffacffff",
>              "release": "luminous",
>              "num": 6
>          }
>      ],
>      "client": [
>          {
>              "features": "0x3ffddff8ffacffff",
>              "release": "luminous",
>              "num": 6
>          }
>      ],
>      "mgr": [
>          {
>              "features": "0x3ffddff8ffacffff",
>              "release": "luminous",
>              "num": 1
>          }
>      ]
> }
> ------------------------------------------------------
> This is a rather fresh Nautilus cluster, which has not yet seen any version 
> upgrade in its lifetime.
>
> Cheers,
>         Oliver
>
>
> >
> >>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>> Cheers,
> >>>> Oliver
> >>>>
> >>>>>
> >>>>>>> However, the dashboard still shows those images in "unknown", and 
> >>>>>>> this also shows up via command line:
> >>>>>>>
> >>>>>>> # rbd mirror pool status
> >>>>>>> health: WARNING
> >>>>>>> images: 51 total
> >>>>>>>         51 unknown
> >>>>>>> # rbd mirror image status test-vm.physik.uni-bonn.de-disk1
> >>>>>>> test-vm.physik.uni-bonn.de-disk2:
> >>>>>>>       global_id:   1a53fafa-37ef-4edf-9633-c2ba3323ed93
> >>>>>>>       state:       down+replaying
> >>>>>>>       description: replaying, master_position=[object_number=18, 
> >>>>>>> tag_tid=6, entry_tid=25202], mirror_position=[object_number=18, 
> >>>>>>> tag_tid=6, entry_tid=25202], entries_behind_master=0
> >>>>>>>       last_update: 2019-09-13 15:55:15
> >>>>>>>
> >>>>>>> Any ideas on what else could cause this?
> >>>>>>>
> >>>>>>> Cheers and thanks,
> >>>>>>>            Oliver
> >>>>>>>
> >>>>>>>>
> >>>>>>>>> Any idea on this (or how I can extract more information)?
> >>>>>>>>> I fear keeping high-level debug logs active for ~24h is not 
> >>>>>>>>> feasible.
> >>>>>>>>>
> >>>>>>>>> Cheers,
> >>>>>>>>>             Oliver
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 2019-09-11 19:14, Jason Dillaman wrote:
> >>>>>>>>>> On Wed, Sep 11, 2019 at 12:57 PM Oliver Freyermuth
> >>>>>>>>>> <freyerm...@physik.uni-bonn.de> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Dear Jason,
> >>>>>>>>>>>
> >>>>>>>>>>> I played a bit more with rbd mirroring and learned that deleting 
> >>>>>>>>>>> an image at the source (or disabling journaling on it) 
> >>>>>>>>>>> immediately moves the image to trash at the target -
> >>>>>>>>>>> but setting rbd_mirroring_delete_delay helps to have some more 
> >>>>>>>>>>> grace time to catch human mistakes.
> >>>>>>>>>>>
> >>>>>>>>>>> However, I have issues restoring such an image which has been 
> >>>>>>>>>>> moved to trash by the RBD-mirror daemon as user:
> >>>>>>>>>>> -----------------------------------
> >>>>>>>>>>> [root@mon001 ~]# rbd trash ls -la
> >>>>>>>>>>> ID           NAME                             SOURCE    
> >>>>>>>>>>> DELETED_AT               STATUS                                   
> >>>>>>>>>>> PARENT
> >>>>>>>>>>> d4fbe8f63905 test-vm-XXXXXXXXXXXXXXXXXX-disk2 MIRRORING Wed Sep 
> >>>>>>>>>>> 11 18:43:14 2019 protected until Thu Sep 12 18:43:14 2019
> >>>>>>>>>>> [root@mon001 ~]# rbd trash restore --image foo-image d4fbe8f63905
> >>>>>>>>>>> rbd: restore error: 2019-09-11 18:50:15.387 7f5fa9590b00 -1 
> >>>>>>>>>>> librbd::api::Trash: restore: Current trash source: mirroring does 
> >>>>>>>>>>> not match expected: user
> >>>>>>>>>>> (22) Invalid argument
> >>>>>>>>>>> -----------------------------------
> >>>>>>>>>>> This is issued on the mon, which has the client.admin key, so it 
> >>>>>>>>>>> should not be a permission issue.
> >>>>>>>>>>> It also fails when I try that in the Dashboard.
> >>>>>>>>>>>
> >>>>>>>>>>> Sadly, the error message is not clear enough for me to figure out 
> >>>>>>>>>>> what could be the problem - do you see what I did wrong?
> >>>>>>>>>>
> >>>>>>>>>> Good catch, it looks like we accidentally broke this in Nautilus 
> >>>>>>>>>> when
> >>>>>>>>>> image live-migration support was added. I've opened a new tracker
> >>>>>>>>>> ticket to fix this [1].
> >>>>>>>>>>
> >>>>>>>>>>> Cheers and thanks again,
> >>>>>>>>>>>             Oliver
> >>>>>>>>>>>
> >>>>>>>>>>> On 2019-09-10 23:17, Oliver Freyermuth wrote:
> >>>>>>>>>>>> Dear Jason,
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 2019-09-10 23:04, Jason Dillaman wrote:
> >>>>>>>>>>>>> On Tue, Sep 10, 2019 at 2:08 PM Oliver Freyermuth
> >>>>>>>>>>>>> <freyerm...@physik.uni-bonn.de> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Dear Jason,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 2019-09-10 18:50, Jason Dillaman wrote:
> >>>>>>>>>>>>>>> On Tue, Sep 10, 2019 at 12:25 PM Oliver Freyermuth
> >>>>>>>>>>>>>>> <freyerm...@physik.uni-bonn.de> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Dear Cephalopodians,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I have two questions about RBD mirroring.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> 1) I can not get it to work - my setup is:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>          - One cluster holding the live RBD volumes and 
> >>>>>>>>>>>>>>>> snapshots, in pool "rbd", cluster name "ceph",
> >>>>>>>>>>>>>>>>            running latest Mimic.
> >>>>>>>>>>>>>>>>            I ran "rbd mirror pool enable rbd pool" on that 
> >>>>>>>>>>>>>>>> cluster and created a cephx user "rbd_mirror" with (is there 
> >>>>>>>>>>>>>>>> a better way?):
> >>>>>>>>>>>>>>>>            ceph auth get-or-create client.rbd_mirror mon 
> >>>>>>>>>>>>>>>> 'allow r' osd 'allow class-read object_prefix rbd_children, 
> >>>>>>>>>>>>>>>> allow pool rbd r' -o ceph.client.rbd_mirror.keyring 
> >>>>>>>>>>>>>>>> --cluster ceph
> >>>>>>>>>>>>>>>>            In that pool, two images have the journaling 
> >>>>>>>>>>>>>>>> feature activated, all others have it disabled still (so I 
> >>>>>>>>>>>>>>>> would expect these two to be mirrored).
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> You can just use "mon 'profile rbd' osd 'profile rbd'" for 
> >>>>>>>>>>>>>>> the caps --
> >>>>>>>>>>>>>>> but you definitely need more than read-only permissions to 
> >>>>>>>>>>>>>>> the remote
> >>>>>>>>>>>>>>> cluster since it needs to be able to create snapshots of 
> >>>>>>>>>>>>>>> remote images
> >>>>>>>>>>>>>>> and update/trim the image journals.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> these profiles really make life a lot easier. I should have 
> >>>>>>>>>>>>>> thought of them rather than "guessing" a potentially good 
> >>>>>>>>>>>>>> configuration...
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>          - Another (empty) cluster running latest Nautilus, 
> >>>>>>>>>>>>>>>> cluster name "ceph", pool "rbd".
> >>>>>>>>>>>>>>>>            I've used the dashboard to activate mirroring for 
> >>>>>>>>>>>>>>>> the RBD pool, and then added a peer with cluster name 
> >>>>>>>>>>>>>>>> "ceph-virt", cephx-ID "rbd_mirror", filled in the mons and 
> >>>>>>>>>>>>>>>> key created above.
> >>>>>>>>>>>>>>>>            I've then run:
> >>>>>>>>>>>>>>>>            ceph auth get-or-create client.rbd_mirror_backup 
> >>>>>>>>>>>>>>>> mon 'allow r' osd 'allow class-read object_prefix 
> >>>>>>>>>>>>>>>> rbd_children, allow pool rbd rwx' -o 
> >>>>>>>>>>>>>>>> client.rbd_mirror_backup.keyring --cluster ceph
> >>>>>>>>>>>>>>>>            and deployed that key on the rbd-mirror machine, 
> >>>>>>>>>>>>>>>> and started the service with:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Please use "mon 'profile rbd-mirror' osd 'profile rbd'" for 
> >>>>>>>>>>>>>>> your caps [1].
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> That did the trick (in combination with the above)!
> >>>>>>>>>>>>>> Again a case of PEBKAC: I should have read the documentation 
> >>>>>>>>>>>>>> until the end, clearly my fault.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> It works well now, even though it seems to run a bit slow (~35 
> >>>>>>>>>>>>>> MB/s for the initial sync when everything is 1 GBit/s),
> >>>>>>>>>>>>>> but that may also be caused by combination of some very 
> >>>>>>>>>>>>>> limited hardware on the receiving end (which will be scaled up 
> >>>>>>>>>>>>>> in the future).
> >>>>>>>>>>>>>> A single host with 6 disks, replica 3 and a RAID controller 
> >>>>>>>>>>>>>> which can only do RAID0 and not JBOD is certainly not ideal, 
> >>>>>>>>>>>>>> so commit latency may cause this slow bandwidth.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> You could try increasing "rbd_concurrent_management_ops" from 
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>> default of 10 ops to something higher to attempt to account for 
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>> latency. However, I wouldn't expect near-line speed w/ RBD 
> >>>>>>>>>>>>> mirroring.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks - I will play with this option once we have more storage 
> >>>>>>>>>>>> available in the target pool ;-).
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>            systemctl start 
> >>>>>>>>>>>>>>>> ceph-rbd-mirror@rbd_mirror_backup.service
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>         After this, everything looks fine:
> >>>>>>>>>>>>>>>>          # rbd mirror pool info
> >>>>>>>>>>>>>>>>            Mode: pool
> >>>>>>>>>>>>>>>>            Peers:
> >>>>>>>>>>>>>>>>             UUID                                 NAME      
> >>>>>>>>>>>>>>>> CLIENT
> >>>>>>>>>>>>>>>>             XXXXXXXXXXX                          ceph-virt 
> >>>>>>>>>>>>>>>> client.rbd_mirror
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>         The service also seems to start fine, but logs show 
> >>>>>>>>>>>>>>>> (debug rbd_mirror=20):
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>         rbd::mirror::ClusterWatcher:0x5575e2a7d390 
> >>>>>>>>>>>>>>>> resolve_peer_config_keys: retrieving config-key: pool_id=2, 
> >>>>>>>>>>>>>>>> pool_name=rbd, peer_uuid=XXXXXXXXXXX
> >>>>>>>>>>>>>>>>         rbd::mirror::Mirror: 0x5575e29c7240 
> >>>>>>>>>>>>>>>> update_pool_replayers: enter
> >>>>>>>>>>>>>>>>         rbd::mirror::Mirror: 0x5575e29c7240 
> >>>>>>>>>>>>>>>> update_pool_replayers: restarting failed pool replayer for 
> >>>>>>>>>>>>>>>> uuid: XXXXXXXXXXX cluster: ceph-virt client: 
> >>>>>>>>>>>>>>>> client.rbd_mirror
> >>>>>>>>>>>>>>>>         rbd::mirror::PoolReplayer: 0x5575e2a7da20 init: 
> >>>>>>>>>>>>>>>> replaying for uuid: XXXXXXXXXXX cluster: ceph-virt client: 
> >>>>>>>>>>>>>>>> client.rbd_mirror
> >>>>>>>>>>>>>>>>         rbd::mirror::PoolReplayer: 0x5575e2a7da20 
> >>>>>>>>>>>>>>>> init_rados: error connecting to remote peer uuid: 
> >>>>>>>>>>>>>>>> XXXXXXXXXXX cluster: ceph-virt client: client.rbd_mirror: 
> >>>>>>>>>>>>>>>> (95) Operation not supported
> >>>>>>>>>>>>>>>>         rbd::mirror::ServiceDaemon: 0x5575e29c8d70 
> >>>>>>>>>>>>>>>> add_or_update_callout: pool_id=2, callout_id=2, 
> >>>>>>>>>>>>>>>> callout_level=error, text=unable to connect to remote cluster
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> If it's still broken after fixing your caps above, perhaps 
> >>>>>>>>>>>>>>> increase
> >>>>>>>>>>>>>>> debugging for "rados", "monc", "auth", and "ms" to see if you 
> >>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>> determine the source of the op not supported error.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I already tried storing the ceph.client.rbd_mirror.keyring 
> >>>>>>>>>>>>>>>> (i.e. from the cluster with the live images) on the 
> >>>>>>>>>>>>>>>> rbd-mirror machine explicitly (i.e. not only in mon config 
> >>>>>>>>>>>>>>>> storage),
> >>>>>>>>>>>>>>>> and after doing that:
> >>>>>>>>>>>>>>>>        rbd -m mon_ip_of_ceph_virt_cluster --id=rbd_mirror ls
> >>>>>>>>>>>>>>>> works fine. So it's not a connectivity issue. Maybe a 
> >>>>>>>>>>>>>>>> permission issue? Or did I miss something?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Any idea what "operation not supported" means?
> >>>>>>>>>>>>>>>> It's unclear to me whether things should work well using 
> >>>>>>>>>>>>>>>> Mimic with Nautilus, and enabling pool mirroring but only 
> >>>>>>>>>>>>>>>> having journaling on for two images is a supported case.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Yes and yes.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> 2) Since there is a performance drawback (about 2x) for 
> >>>>>>>>>>>>>>>> journaling, is it also possible to only mirror snapshots, 
> >>>>>>>>>>>>>>>> and leave the live volumes alone?
> >>>>>>>>>>>>>>>>          This would cover the common backup usecase before 
> >>>>>>>>>>>>>>>> deferred mirroring is implemented (or is it there already?).
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> This is in-development right now and will hopefully land for 
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>> Octopus release.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> That would be very cool. Just to clarify: You mean the "real" 
> >>>>>>>>>>>>>> deferred mirroring, not a "snapshot only" mirroring?
> >>>>>>>>>>>>>> Is it already clear if this will require Octopous (or a later 
> >>>>>>>>>>>>>> release) on both ends, or only on the receiving side?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I might not be sure what you mean by deferred mirroring. You 
> >>>>>>>>>>>>> can delay
> >>>>>>>>>>>>> the replay of the journal via the "rbd_mirroring_replay_delay"
> >>>>>>>>>>>>> configuration option so that your DR site can be X seconds 
> >>>>>>>>>>>>> behind the
> >>>>>>>>>>>>> primary at a minimum.
> >>>>>>>>>>>>
> >>>>>>>>>>>> This is indeed what I was thinking of...
> >>>>>>>>>>>>
> >>>>>>>>>>>>> For Octopus we are working on on-demand and
> >>>>>>>>>>>>> scheduled snapshot mirroring between sites -- no journal is 
> >>>>>>>>>>>>> involved.
> >>>>>>>>>>>>
> >>>>>>>>>>>> ... and this is what I was dreaming of. We keep snapshots of VMs 
> >>>>>>>>>>>> to be able to roll them back.
> >>>>>>>>>>>> We'd like to also keep those snapshots in a separate Ceph 
> >>>>>>>>>>>> instance as an additional safety-net (in addition to an offline 
> >>>>>>>>>>>> backup of those snapshots with Benji backup).
> >>>>>>>>>>>> It is not (yet) clear to me whether we can pay the "2 x" price 
> >>>>>>>>>>>> for journaling in the long run, so this would be the way to go 
> >>>>>>>>>>>> in case we can't.
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Since I got you personally, I have two bonus questions.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 1) Your talk:
> >>>>>>>>>>>>>>         
> >>>>>>>>>>>>>> https://events.static.linuxfound.org/sites/events/files/slides/Disaster%20Recovery%20and%20Ceph%20Block%20Storage-%20Introducing%20Multi-Site%20Mirroring.pdf
> >>>>>>>>>>>>>>         mentions "rbd journal object flush age", which I'd 
> >>>>>>>>>>>>>> translate with something like the "commit" mount option on a 
> >>>>>>>>>>>>>> classical file system - correct?
> >>>>>>>>>>>>>>         I don't find this switch documented anywhere, though - 
> >>>>>>>>>>>>>> is there experience with it / what's the default?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> It's a low-level knob that by default causes the journal to 
> >>>>>>>>>>>>> flush its
> >>>>>>>>>>>>> pending IO events before it allows the corresponding IO to be 
> >>>>>>>>>>>>> issued
> >>>>>>>>>>>>> against the backing image. Setting it to a value greater that 
> >>>>>>>>>>>>> zero
> >>>>>>>>>>>>> will allow that many seconds of IO events to be batched 
> >>>>>>>>>>>>> together in a
> >>>>>>>>>>>>> journal append operation and its helpful for high-throughout, 
> >>>>>>>>>>>>> small IO
> >>>>>>>>>>>>> operations. Of course it turned out that a bug had broken that 
> >>>>>>>>>>>>> option
> >>>>>>>>>>>>> a while where events would never batch, so a fix is currently
> >>>>>>>>>>>>> scheduled for backport of all active releases [1] w/ the goal 
> >>>>>>>>>>>>> that no
> >>>>>>>>>>>>> one should need to tweak it.
> >>>>>>>>>>>>
> >>>>>>>>>>>> That's even better - since our setup is growing and we will keep 
> >>>>>>>>>>>> upgrading, I'll then just keep things as they are now (no manual 
> >>>>>>>>>>>> tweaking)
> >>>>>>>>>>>> and tag along the development. Thanks!
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> 2) I read I can run more than one rbd-mirror with 
> >>>>>>>>>>>>>> Mimic/Nautilus. Do they load-balance the images, or "only" 
> >>>>>>>>>>>>>> failover in case one of them dies?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Starting with Nautilus, the default configuration for 
> >>>>>>>>>>>>> rbd-mirror is to
> >>>>>>>>>>>>> evenly divide the number of mirrored images between all running
> >>>>>>>>>>>>> daemons. This does not split the total load since some images 
> >>>>>>>>>>>>> might be
> >>>>>>>>>>>>> hotter than others, but it at least spreads the load.
> >>>>>>>>>>>>
> >>>>>>>>>>>> That's fine enough for our use case. Spreading by "hotness" is a 
> >>>>>>>>>>>> task without a clear answer
> >>>>>>>>>>>> and "temperature" may change quickly, so that's all I hoped for.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Many thanks again for the very helpful explanations!
> >>>>>>>>>>>>
> >>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>           Oliver
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Cheers and many thanks for the quick and perfect help!
> >>>>>>>>>>>>>>              Oliver
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Cheers and thanks in advance,
> >>>>>>>>>>>>>>>>              Oliver
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>>>>>> ceph-users mailing list
> >>>>>>>>>>>>>>>> ceph-users@lists.ceph.com
> >>>>>>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> [1] 
> >>>>>>>>>>>>>>> https://docs.ceph.com/docs/master/rbd/rbd-mirroring/#rbd-mirror-daemon
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>> Jason
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> [1] https://github.com/ceph/ceph/pull/28539
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> [1] https://tracker.ceph.com/issues/41780
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Jason
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Jason
> >>>
> >>
> >>
> >
> >
>

[1] https://tracker.ceph.com/issues/41833

-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph RBD Mirroring

Reply via email to