Re: [ceph-users] Ceph RBD Mirroring

Oliver Freyermuth Sat, 14 Sep 2019 15:13:17 -0700

Dear Jason,

Am 15.09.19 um 00:03 schrieb Jason Dillaman:
> I was able to repeat this issue locally by restarting the primary OSD
> for the "rbd_mirroring" object. It seems that a regression was
> introduced w/ the introduction of Ceph msgr2 in that upon reconnect,
> the connection type for the client switches from ANY to V2 -- but only
> for the watcher session and not the status updates. I've opened a
> tracker ticker for this issue [1].
> 
> Thanks.


many thanks to you for the detailed investigation and reproduction!
While I did not restart the first 5 OSDs of the test cluster, I added an OSD 
and rebalanced - so I guess this can also be triggered if the primary OSD for 
the object changes,
which should of course also lead to a reconnection. 
I can also add to my observations that now while not touching the cluster 
anymore things stay in "up+replaying". 

Thanks and all the best,
        Oliver

> 
> On Fri, Sep 13, 2019 at 12:44 PM Oliver Freyermuth
> <[email protected]> wrote:
>>
>> Am 13.09.19 um 18:38 schrieb Jason Dillaman:
>>> On Fri, Sep 13, 2019 at 11:30 AM Oliver Freyermuth
>>> <[email protected]> wrote:
>>>>
>>>> Am 13.09.19 um 17:18 schrieb Jason Dillaman:
>>>>> On Fri, Sep 13, 2019 at 10:41 AM Oliver Freyermuth
>>>>> <[email protected]> wrote:
>>>>>>
>>>>>> Am 13.09.19 um 16:30 schrieb Jason Dillaman:
>>>>>>> On Fri, Sep 13, 2019 at 10:17 AM Jason Dillaman <[email protected]> 
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> On Fri, Sep 13, 2019 at 10:02 AM Oliver Freyermuth
>>>>>>>> <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>> Dear Jason,
>>>>>>>>>
>>>>>>>>> thanks for the very detailed explanation! This was very instructive.
>>>>>>>>> Sadly, the watchers look correct - see details inline.
>>>>>>>>>
>>>>>>>>> Am 13.09.19 um 15:02 schrieb Jason Dillaman:
>>>>>>>>>> On Thu, Sep 12, 2019 at 9:55 PM Oliver Freyermuth
>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Dear Jason,
>>>>>>>>>>>
>>>>>>>>>>> thanks for taking care and developing a patch so quickly!
>>>>>>>>>>>
>>>>>>>>>>> I have another strange observation to share. In our test setup, 
>>>>>>>>>>> only a single RBD mirroring daemon is running for 51 images.
>>>>>>>>>>> It works fine with a constant stream of 1-2 MB/s, but at some point 
>>>>>>>>>>> after roughly 20 hours, _all_ images go to this interesting state:
>>>>>>>>>>> -----------------------------------------
>>>>>>>>>>> # rbd mirror image status test-vm.XXXXX-disk2
>>>>>>>>>>> test-vm.XXXXX-disk2:
>>>>>>>>>>>       global_id:   XXXXXXXXXXXXXXX
>>>>>>>>>>>       state:       down+replaying
>>>>>>>>>>>       description: replaying, master_position=[object_number=14, 
>>>>>>>>>>> tag_tid=6, entry_tid=6338], mirror_position=[object_number=14, 
>>>>>>>>>>> tag_tid=6, entry_tid=6338], entries_behind_master=0
>>>>>>>>>>>       last_update: 2019-09-13 03:45:43
>>>>>>>>>>> -----------------------------------------
>>>>>>>>>>> Running this command several times, I see entry_tid increasing at 
>>>>>>>>>>> both ends, so mirroring seems to be working just fine.
>>>>>>>>>>>
>>>>>>>>>>> However:
>>>>>>>>>>> -----------------------------------------
>>>>>>>>>>> # rbd mirror pool status
>>>>>>>>>>> health: WARNING
>>>>>>>>>>> images: 51 total
>>>>>>>>>>>         51 unknown
>>>>>>>>>>> -----------------------------------------
>>>>>>>>>>> The health warning is not visible in the dashboard (also not in the 
>>>>>>>>>>> mirroring menu), the daemon still seems to be running, dropped 
>>>>>>>>>>> nothing in the logs,
>>>>>>>>>>> and claims to be "ok" in the dashboard - it's only that all images 
>>>>>>>>>>> show up in unknown state even though all seems to be working fine.
>>>>>>>>>>>
>>>>>>>>>>> Any idea on how to debug this?
>>>>>>>>>>> When I restart the rbd-mirror service, all images come back as 
>>>>>>>>>>> green. I already encountered this twice in 3 days.
>>>>>>>>>>
>>>>>>>>>> The dashboard relies on the rbd-mirror daemon to provide it errors 
>>>>>>>>>> and
>>>>>>>>>> warnings. You can see the status reported by rbd-mirror by running
>>>>>>>>>> "ceph service status":
>>>>>>>>>>
>>>>>>>>>> $ ceph service status
>>>>>>>>>> {
>>>>>>>>>>         "rbd-mirror": {
>>>>>>>>>>             "4152": {
>>>>>>>>>>                 "status_stamp": "2019-09-13T08:58:41.937491-0400",
>>>>>>>>>>                 "last_beacon": "2019-09-13T08:58:41.937491-0400",
>>>>>>>>>>                 "status": {
>>>>>>>>>>                     "json":
>>>>>>>>>> "{\"1\":{\"name\":\"mirror\",\"callouts\":{},\"image_assigned_count\":1,\"image_error_count\":0,\"image_local_count\":1,\"image_remote_count\":1,\"image_warning_count\":0,\"instance_id\":\"4154\",\"leader\":true},\"2\":{\"name\":\"mirror_parent\",\"callouts\":{},\"image_assigned_count\":0,\"image_error_count\":0,\"image_local_count\":0,\"image_remote_count\":0,\"image_warning_count\":0,\"instance_id\":\"4156\",\"leader\":true}}"
>>>>>>>>>>                 }
>>>>>>>>>>             }
>>>>>>>>>>         }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> In your case, most likely it seems like rbd-mirror thinks all is good
>>>>>>>>>> with the world so it's not reporting any errors.
>>>>>>>>>
>>>>>>>>> This is indeed the case:
>>>>>>>>>
>>>>>>>>> # ceph service status
>>>>>>>>> {
>>>>>>>>>         "rbd-mirror": {
>>>>>>>>>             "84243": {
>>>>>>>>>                 "status_stamp": "2019-09-13 15:40:01.149815",
>>>>>>>>>                 "last_beacon": "2019-09-13 15:40:26.151381",
>>>>>>>>>                 "status": {
>>>>>>>>>                     "json": 
>>>>>>>>> "{\"2\":{\"name\":\"rbd\",\"callouts\":{},\"image_assigned_count\":51,\"image_error_count\":0,\"image_local_count\":51,\"image_remote_count\":51,\"image_warning_count\":0,\"instance_id\":\"84247\",\"leader\":true}}"
>>>>>>>>>                 }
>>>>>>>>>             }
>>>>>>>>>         },
>>>>>>>>>         "rgw": {
>>>>>>>>> ...
>>>>>>>>>         }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>> The "down" state indicates that the rbd-mirror daemon isn't correctly
>>>>>>>>>> watching the "rbd_mirroring" object in the pool. You can see who it
>>>>>>>>>> watching that object by running the "rados" "listwatchers" command:
>>>>>>>>>>
>>>>>>>>>> $ rados -p <pool name> listwatchers rbd_mirroring
>>>>>>>>>> watcher=1.2.3.4:0/199388543 client.4154 cookie=94769010788992
>>>>>>>>>> watcher=1.2.3.4:0/199388543 client.4154 cookie=94769061031424
>>>>>>>>>>
>>>>>>>>>> In my case, the "4154" from "client.4154" is the unique global id for
>>>>>>>>>> my connection to the cluster, which relates back to the "ceph service
>>>>>>>>>> status" dump which also shows status by daemon using the unique 
>>>>>>>>>> global
>>>>>>>>>> id.
>>>>>>>>>
>>>>>>>>> Sadly(?), this looks as expected:
>>>>>>>>>
>>>>>>>>> # rados -p rbd listwatchers rbd_mirroring
>>>>>>>>> watcher=10.160.19.240:0/2922488671 client.84247 cookie=139770046978672
>>>>>>>>> watcher=10.160.19.240:0/2922488671 client.84247 cookie=139771389162560
>>>>>>>>
>>>>>>>> Hmm, the unique id is different (84243 vs 84247). I wouldn't have
>>>>>>>> expected the global id to have changed. Did you restart the Ceph
>>>>>>>> cluster or MONs? Do you see any "peer assigned me a different
>>>>>>>> global_id" errors in your rbd-mirror logs?
>>>>>>>>
>>>>>>>> I'll open a tracker ticket to fix the "ceph service status", though,
>>>>>>>> since clearly your global id changed but it wasn't noticed by the
>>>>>>>> service daemon status updater.
>>>>>>>
>>>>>>> ... also, can you please provide the output from the following via a
>>>>>>> pastebin link?
>>>>>>>
>>>>>>> # rados -p rbd listomapvals rbd_mirroring
>>>>>>
>>>>>> Of course, here you go:
>>>>>> https://0x0.st/zy8J.txt
>>>>>
>>>>> Thanks. For the case above of global image id
>>>>> 1a53fafa-37ef-4edf-9633-c2ba3323ed93, the on-disk status shows that it
>>>>> was last updated by client.84247 / nonce 2922488671, which correctly
>>>>> matches your watcher so the status should be "up":
>>>>>
>>>>> status_global_1a53fafa-37ef-4edf-9633-c2ba3323ed93
>>>>> value (232 bytes) :
>>>>> 00000000  01 01 2c 00 00 00 08 17  49 01 00 00 00 00 00 01
>>>>> |..,.....I.......|     <--- "17  49 01 00 00 00 00 00" (84247) is the
>>>>> instance id
>>>>> 00000010  01 01 1c 00 00 00 03 00  00 00 5f a3 31 ae 10 00
>>>>> |.........._.1...|    <--- "5f a3 31 ae" is the nonce (2922488671)
>>>>> 00000020  00 00 02 00 00 00 0a a0  13 f0 00 00 00 00 00 00
>>>>> |................|     <--- "0a a0  13 f0" is the IP address
>>>>> (10.160.9.240)
>>>>> 00000030  00 00 01 01 b0 00 00 00  04 a2 00 00 00 72 65 70  
>>>>> |.............rep|
>>>>> 00000040  6c 61 79 69 6e 67 2c 20  6d 61 73 74 65 72 5f 70  |laying, 
>>>>> master_p|
>>>>> 00000050  6f 73 69 74 69 6f 6e 3d  5b 6f 62 6a 65 63 74 5f  
>>>>> |osition=[object_|
>>>>> 00000060  6e 75 6d 62 65 72 3d 31  39 2c 20 74 61 67 5f 74  |number=19, 
>>>>> tag_t|
>>>>> 00000070  69 64 3d 36 2c 20 65 6e  74 72 79 5f 74 69 64 3d  |id=6, 
>>>>> entry_tid=|
>>>>> 00000080  32 36 34 34 33 5d 2c 20  6d 69 72 72 6f 72 5f 70  |26443], 
>>>>> mirror_p|
>>>>> 00000090  6f 73 69 74 69 6f 6e 3d  5b 6f 62 6a 65 63 74 5f  
>>>>> |osition=[object_|
>>>>> 000000a0  6e 75 6d 62 65 72 3d 31  39 2c 20 74 61 67 5f 74  |number=19, 
>>>>> tag_t|
>>>>> 000000b0  69 64 3d 36 2c 20 65 6e  74 72 79 5f 74 69 64 3d  |id=6, 
>>>>> entry_tid=|
>>>>> 000000c0  32 36 34 34 33 5d 2c 20  65 6e 74 72 69 65 73 5f  |26443], 
>>>>> entries_|
>>>>> 000000d0  62 65 68 69 6e 64 5f 6d  61 73 74 65 72 3d 30 51  
>>>>> |behind_master=0Q|
>>>>> 000000e0  aa 7b 5d 1b 5f 4f 33 00                           |.{]._O3.|
>>>>> 000000e8
>>>>>
>>>>> The only thing I can think of is that somehow the watcher entity
>>>>> instance has a different encoding and its failing a comparison. Can
>>>>> you restart rbd-mirror such that the statuses list "up+replaying" and
>>>>> then run the following?
>>>>>
>>>>> # rados -p rbd getomapval rbd_mirroring
>>>>> status_global_1a53fafa-37ef-4edf-9633-c2ba3323ed93
>>>>
>>>> Interesting! Again, thanks for the detailed context - learning a bit more 
>>>> about the internals is one of the many reasons why we love Ceph so much,
>>>> and something which fully proprietary code will usually never deliver :-).
>>>>
>>>> Here's the output after the restart, image is in up+replaying state:
>>>>
>>>> # rados -p rbd getomapval rbd_mirroring 
>>>> status_global_1a53fafa-37ef-4edf-9633-c2ba3323ed93
>>>> value (232 bytes) :
>>>> 00000000  01 01 2c 00 00 00 08 ec  50 01 00 00 00 00 00 01  
>>>> |..,.....P.......|
>>>> 00000010  01 01 1c 00 00 00 03 00  00 00 0b 24 cd a5 10 00  
>>>> |...........$....|
>>>> 00000020  00 00 02 00 00 00 0a a0  13 f0 00 00 00 00 00 00  
>>>> |................|
>>>> 00000030  00 00 01 01 b0 00 00 00  04 a2 00 00 00 72 65 70  
>>>> |.............rep|
>>>> 00000040  6c 61 79 69 6e 67 2c 20  6d 61 73 74 65 72 5f 70  |laying, 
>>>> master_p|
>>>> 00000050  6f 73 69 74 69 6f 6e 3d  5b 6f 62 6a 65 63 74 5f  
>>>> |osition=[object_|
>>>> 00000060  6e 75 6d 62 65 72 3d 31  38 2c 20 74 61 67 5f 74  |number=18, 
>>>> tag_t|
>>>> 00000070  69 64 3d 36 2c 20 65 6e  74 72 79 5f 74 69 64 3d  |id=6, 
>>>> entry_tid=|
>>>> 00000080  32 37 36 32 36 5d 2c 20  6d 69 72 72 6f 72 5f 70  |27626], 
>>>> mirror_p|
>>>> 00000090  6f 73 69 74 69 6f 6e 3d  5b 6f 62 6a 65 63 74 5f  
>>>> |osition=[object_|
>>>> 000000a0  6e 75 6d 62 65 72 3d 31  38 2c 20 74 61 67 5f 74  |number=18, 
>>>> tag_t|
>>>> 000000b0  69 64 3d 36 2c 20 65 6e  74 72 79 5f 74 69 64 3d  |id=6, 
>>>> entry_tid=|
>>>> 000000c0  32 37 36 32 36 5d 2c 20  65 6e 74 72 69 65 73 5f  |27626], 
>>>> entries_|
>>>> 000000d0  62 65 68 69 6e 64 5f 6d  61 73 74 65 72 3d 30 eb  
>>>> |behind_master=0.|
>>>> 000000e0  b3 7b 5d 27 9c d8 31 00                           |.{]'..1.|
>>>> 000000e8
>>>>
>>>> IIUC, this decodes to instance ID 86252, IP address of course stayed the 
>>>> same.
>>>>
>>>> Checking the other output:
>>>>
>>>> # ceph service status
>>>> {
>>>>       "rbd-mirror": {
>>>>           "86248": {
>>>>               "status_stamp": "2019-09-13 17:26:15.391048",
>>>>               "last_beacon": "2019-09-13 17:26:25.391759",
>>>>               "status": {
>>>>                   "json": 
>>>> "{\"2\":{\"name\":\"rbd\",\"callouts\":{},\"image_assigned_count\":51,\"image_error_count\":0,\"image_local_count\":51,\"image_remote_count\":51,\"image_warning_count\":0,\"instance_id\":\"86252\",\"leader\":true}}"
>>>>               }
>>>>           }
>>>>       },
>>>> ...
>>>> }
>>>>
>>>> # rados -p rbd listwatchers rbd_mirroring
>>>> watcher=10.160.19.240:0/2781684747 client.86252 cookie=140089552292144
>>>> watcher=10.160.19.240:0/2781684747 client.86252 cookie=140090961572928
>>>>
>>>> This looks as strange as before again: Global instance ID is 86248, but 
>>>> instance ID (and what I find in the omap dump) is 86252.
>>>>
>>>> However, things look okay in the dashboard again and also:
>>>> # rbd mirror pool status
>>>> health: OK
>>>> images: 51 total
>>>>       51 replaying
>>>>
>>>> Cheers,
>>>>          Oliver
>>>
>>> Can you also provide the output from "ceph features"?
>>
>> Here you go:
>> ------------------------------------------------------
>> # ceph features
>> {
>>      "mon": [
>>          {
>>              "features": "0x3ffddff8ffacffff",
>>              "release": "luminous",
>>              "num": 1
>>          }
>>      ],
>>      "osd": [
>>          {
>>              "features": "0x3ffddff8ffacffff",
>>              "release": "luminous",
>>              "num": 6
>>          }
>>      ],
>>      "client": [
>>          {
>>              "features": "0x3ffddff8ffacffff",
>>              "release": "luminous",
>>              "num": 6
>>          }
>>      ],
>>      "mgr": [
>>          {
>>              "features": "0x3ffddff8ffacffff",
>>              "release": "luminous",
>>              "num": 1
>>          }
>>      ]
>> }
>> ------------------------------------------------------
>> This is a rather fresh Nautilus cluster, which has not yet seen any version 
>> upgrade in its lifetime.
>>
>> Cheers,
>>         Oliver
>>
>>
>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Cheers,
>>>>>> Oliver
>>>>>>
>>>>>>>
>>>>>>>>> However, the dashboard still shows those images in "unknown", and 
>>>>>>>>> this also shows up via command line:
>>>>>>>>>
>>>>>>>>> # rbd mirror pool status
>>>>>>>>> health: WARNING
>>>>>>>>> images: 51 total
>>>>>>>>>         51 unknown
>>>>>>>>> # rbd mirror image status test-vm.physik.uni-bonn.de-disk1
>>>>>>>>> test-vm.physik.uni-bonn.de-disk2:
>>>>>>>>>       global_id:   1a53fafa-37ef-4edf-9633-c2ba3323ed93
>>>>>>>>>       state:       down+replaying
>>>>>>>>>       description: replaying, master_position=[object_number=18, 
>>>>>>>>> tag_tid=6, entry_tid=25202], mirror_position=[object_number=18, 
>>>>>>>>> tag_tid=6, entry_tid=25202], entries_behind_master=0
>>>>>>>>>       last_update: 2019-09-13 15:55:15
>>>>>>>>>
>>>>>>>>> Any ideas on what else could cause this?
>>>>>>>>>
>>>>>>>>> Cheers and thanks,
>>>>>>>>>            Oliver
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Any idea on this (or how I can extract more information)?
>>>>>>>>>>> I fear keeping high-level debug logs active for ~24h is not 
>>>>>>>>>>> feasible.
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>>             Oliver
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 2019-09-11 19:14, Jason Dillaman wrote:
>>>>>>>>>>>> On Wed, Sep 11, 2019 at 12:57 PM Oliver Freyermuth
>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Dear Jason,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I played a bit more with rbd mirroring and learned that deleting 
>>>>>>>>>>>>> an image at the source (or disabling journaling on it) 
>>>>>>>>>>>>> immediately moves the image to trash at the target -
>>>>>>>>>>>>> but setting rbd_mirroring_delete_delay helps to have some more 
>>>>>>>>>>>>> grace time to catch human mistakes.
>>>>>>>>>>>>>
>>>>>>>>>>>>> However, I have issues restoring such an image which has been 
>>>>>>>>>>>>> moved to trash by the RBD-mirror daemon as user:
>>>>>>>>>>>>> -----------------------------------
>>>>>>>>>>>>> [root@mon001 ~]# rbd trash ls -la
>>>>>>>>>>>>> ID           NAME                             SOURCE    
>>>>>>>>>>>>> DELETED_AT               STATUS                                   
>>>>>>>>>>>>> PARENT
>>>>>>>>>>>>> d4fbe8f63905 test-vm-XXXXXXXXXXXXXXXXXX-disk2 MIRRORING Wed Sep 
>>>>>>>>>>>>> 11 18:43:14 2019 protected until Thu Sep 12 18:43:14 2019
>>>>>>>>>>>>> [root@mon001 ~]# rbd trash restore --image foo-image d4fbe8f63905
>>>>>>>>>>>>> rbd: restore error: 2019-09-11 18:50:15.387 7f5fa9590b00 -1 
>>>>>>>>>>>>> librbd::api::Trash: restore: Current trash source: mirroring does 
>>>>>>>>>>>>> not match expected: user
>>>>>>>>>>>>> (22) Invalid argument
>>>>>>>>>>>>> -----------------------------------
>>>>>>>>>>>>> This is issued on the mon, which has the client.admin key, so it 
>>>>>>>>>>>>> should not be a permission issue.
>>>>>>>>>>>>> It also fails when I try that in the Dashboard.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sadly, the error message is not clear enough for me to figure out 
>>>>>>>>>>>>> what could be the problem - do you see what I did wrong?
>>>>>>>>>>>>
>>>>>>>>>>>> Good catch, it looks like we accidentally broke this in Nautilus 
>>>>>>>>>>>> when
>>>>>>>>>>>> image live-migration support was added. I've opened a new tracker
>>>>>>>>>>>> ticket to fix this [1].
>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers and thanks again,
>>>>>>>>>>>>>             Oliver
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 2019-09-10 23:17, Oliver Freyermuth wrote:
>>>>>>>>>>>>>> Dear Jason,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 2019-09-10 23:04, Jason Dillaman wrote:
>>>>>>>>>>>>>>> On Tue, Sep 10, 2019 at 2:08 PM Oliver Freyermuth
>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Dear Jason,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 2019-09-10 18:50, Jason Dillaman wrote:
>>>>>>>>>>>>>>>>> On Tue, Sep 10, 2019 at 12:25 PM Oliver Freyermuth
>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Dear Cephalopodians,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I have two questions about RBD mirroring.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1) I can not get it to work - my setup is:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>          - One cluster holding the live RBD volumes and 
>>>>>>>>>>>>>>>>>> snapshots, in pool "rbd", cluster name "ceph",
>>>>>>>>>>>>>>>>>>            running latest Mimic.
>>>>>>>>>>>>>>>>>>            I ran "rbd mirror pool enable rbd pool" on that 
>>>>>>>>>>>>>>>>>> cluster and created a cephx user "rbd_mirror" with (is there 
>>>>>>>>>>>>>>>>>> a better way?):
>>>>>>>>>>>>>>>>>>            ceph auth get-or-create client.rbd_mirror mon 
>>>>>>>>>>>>>>>>>> 'allow r' osd 'allow class-read object_prefix rbd_children, 
>>>>>>>>>>>>>>>>>> allow pool rbd r' -o ceph.client.rbd_mirror.keyring 
>>>>>>>>>>>>>>>>>> --cluster ceph
>>>>>>>>>>>>>>>>>>            In that pool, two images have the journaling 
>>>>>>>>>>>>>>>>>> feature activated, all others have it disabled still (so I 
>>>>>>>>>>>>>>>>>> would expect these two to be mirrored).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> You can just use "mon 'profile rbd' osd 'profile rbd'" for 
>>>>>>>>>>>>>>>>> the caps --
>>>>>>>>>>>>>>>>> but you definitely need more than read-only permissions to 
>>>>>>>>>>>>>>>>> the remote
>>>>>>>>>>>>>>>>> cluster since it needs to be able to create snapshots of 
>>>>>>>>>>>>>>>>> remote images
>>>>>>>>>>>>>>>>> and update/trim the image journals.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> these profiles really make life a lot easier. I should have 
>>>>>>>>>>>>>>>> thought of them rather than "guessing" a potentially good 
>>>>>>>>>>>>>>>> configuration...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>          - Another (empty) cluster running latest Nautilus, 
>>>>>>>>>>>>>>>>>> cluster name "ceph", pool "rbd".
>>>>>>>>>>>>>>>>>>            I've used the dashboard to activate mirroring for 
>>>>>>>>>>>>>>>>>> the RBD pool, and then added a peer with cluster name 
>>>>>>>>>>>>>>>>>> "ceph-virt", cephx-ID "rbd_mirror", filled in the mons and 
>>>>>>>>>>>>>>>>>> key created above.
>>>>>>>>>>>>>>>>>>            I've then run:
>>>>>>>>>>>>>>>>>>            ceph auth get-or-create client.rbd_mirror_backup 
>>>>>>>>>>>>>>>>>> mon 'allow r' osd 'allow class-read object_prefix 
>>>>>>>>>>>>>>>>>> rbd_children, allow pool rbd rwx' -o 
>>>>>>>>>>>>>>>>>> client.rbd_mirror_backup.keyring --cluster ceph
>>>>>>>>>>>>>>>>>>            and deployed that key on the rbd-mirror machine, 
>>>>>>>>>>>>>>>>>> and started the service with:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Please use "mon 'profile rbd-mirror' osd 'profile rbd'" for 
>>>>>>>>>>>>>>>>> your caps [1].
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> That did the trick (in combination with the above)!
>>>>>>>>>>>>>>>> Again a case of PEBKAC: I should have read the documentation 
>>>>>>>>>>>>>>>> until the end, clearly my fault.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It works well now, even though it seems to run a bit slow (~35 
>>>>>>>>>>>>>>>> MB/s for the initial sync when everything is 1 GBit/s),
>>>>>>>>>>>>>>>> but that may also be caused by combination of some very 
>>>>>>>>>>>>>>>> limited hardware on the receiving end (which will be scaled up 
>>>>>>>>>>>>>>>> in the future).
>>>>>>>>>>>>>>>> A single host with 6 disks, replica 3 and a RAID controller 
>>>>>>>>>>>>>>>> which can only do RAID0 and not JBOD is certainly not ideal, 
>>>>>>>>>>>>>>>> so commit latency may cause this slow bandwidth.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You could try increasing "rbd_concurrent_management_ops" from 
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> default of 10 ops to something higher to attempt to account for 
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> latency. However, I wouldn't expect near-line speed w/ RBD 
>>>>>>>>>>>>>>> mirroring.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks - I will play with this option once we have more storage 
>>>>>>>>>>>>>> available in the target pool ;-).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>            systemctl start 
>>>>>>>>>>>>>>>>>> ceph-rbd-mirror@rbd_mirror_backup.service
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>         After this, everything looks fine:
>>>>>>>>>>>>>>>>>>          # rbd mirror pool info
>>>>>>>>>>>>>>>>>>            Mode: pool
>>>>>>>>>>>>>>>>>>            Peers:
>>>>>>>>>>>>>>>>>>             UUID                                 NAME      
>>>>>>>>>>>>>>>>>> CLIENT
>>>>>>>>>>>>>>>>>>             XXXXXXXXXXX                          ceph-virt 
>>>>>>>>>>>>>>>>>> client.rbd_mirror
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>         The service also seems to start fine, but logs show 
>>>>>>>>>>>>>>>>>> (debug rbd_mirror=20):
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>         rbd::mirror::ClusterWatcher:0x5575e2a7d390 
>>>>>>>>>>>>>>>>>> resolve_peer_config_keys: retrieving config-key: pool_id=2, 
>>>>>>>>>>>>>>>>>> pool_name=rbd, peer_uuid=XXXXXXXXXXX
>>>>>>>>>>>>>>>>>>         rbd::mirror::Mirror: 0x5575e29c7240 
>>>>>>>>>>>>>>>>>> update_pool_replayers: enter
>>>>>>>>>>>>>>>>>>         rbd::mirror::Mirror: 0x5575e29c7240 
>>>>>>>>>>>>>>>>>> update_pool_replayers: restarting failed pool replayer for 
>>>>>>>>>>>>>>>>>> uuid: XXXXXXXXXXX cluster: ceph-virt client: 
>>>>>>>>>>>>>>>>>> client.rbd_mirror
>>>>>>>>>>>>>>>>>>         rbd::mirror::PoolReplayer: 0x5575e2a7da20 init: 
>>>>>>>>>>>>>>>>>> replaying for uuid: XXXXXXXXXXX cluster: ceph-virt client: 
>>>>>>>>>>>>>>>>>> client.rbd_mirror
>>>>>>>>>>>>>>>>>>         rbd::mirror::PoolReplayer: 0x5575e2a7da20 
>>>>>>>>>>>>>>>>>> init_rados: error connecting to remote peer uuid: 
>>>>>>>>>>>>>>>>>> XXXXXXXXXXX cluster: ceph-virt client: client.rbd_mirror: 
>>>>>>>>>>>>>>>>>> (95) Operation not supported
>>>>>>>>>>>>>>>>>>         rbd::mirror::ServiceDaemon: 0x5575e29c8d70 
>>>>>>>>>>>>>>>>>> add_or_update_callout: pool_id=2, callout_id=2, 
>>>>>>>>>>>>>>>>>> callout_level=error, text=unable to connect to remote cluster
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> If it's still broken after fixing your caps above, perhaps 
>>>>>>>>>>>>>>>>> increase
>>>>>>>>>>>>>>>>> debugging for "rados", "monc", "auth", and "ms" to see if you 
>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>> determine the source of the op not supported error.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I already tried storing the ceph.client.rbd_mirror.keyring 
>>>>>>>>>>>>>>>>>> (i.e. from the cluster with the live images) on the 
>>>>>>>>>>>>>>>>>> rbd-mirror machine explicitly (i.e. not only in mon config 
>>>>>>>>>>>>>>>>>> storage),
>>>>>>>>>>>>>>>>>> and after doing that:
>>>>>>>>>>>>>>>>>>        rbd -m mon_ip_of_ceph_virt_cluster --id=rbd_mirror ls
>>>>>>>>>>>>>>>>>> works fine. So it's not a connectivity issue. Maybe a 
>>>>>>>>>>>>>>>>>> permission issue? Or did I miss something?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Any idea what "operation not supported" means?
>>>>>>>>>>>>>>>>>> It's unclear to me whether things should work well using 
>>>>>>>>>>>>>>>>>> Mimic with Nautilus, and enabling pool mirroring but only 
>>>>>>>>>>>>>>>>>> having journaling on for two images is a supported case.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yes and yes.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2) Since there is a performance drawback (about 2x) for 
>>>>>>>>>>>>>>>>>> journaling, is it also possible to only mirror snapshots, 
>>>>>>>>>>>>>>>>>> and leave the live volumes alone?
>>>>>>>>>>>>>>>>>>          This would cover the common backup usecase before 
>>>>>>>>>>>>>>>>>> deferred mirroring is implemented (or is it there already?).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This is in-development right now and will hopefully land for 
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> Octopus release.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> That would be very cool. Just to clarify: You mean the "real" 
>>>>>>>>>>>>>>>> deferred mirroring, not a "snapshot only" mirroring?
>>>>>>>>>>>>>>>> Is it already clear if this will require Octopous (or a later 
>>>>>>>>>>>>>>>> release) on both ends, or only on the receiving side?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I might not be sure what you mean by deferred mirroring. You 
>>>>>>>>>>>>>>> can delay
>>>>>>>>>>>>>>> the replay of the journal via the "rbd_mirroring_replay_delay"
>>>>>>>>>>>>>>> configuration option so that your DR site can be X seconds 
>>>>>>>>>>>>>>> behind the
>>>>>>>>>>>>>>> primary at a minimum.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is indeed what I was thinking of...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> For Octopus we are working on on-demand and
>>>>>>>>>>>>>>> scheduled snapshot mirroring between sites -- no journal is 
>>>>>>>>>>>>>>> involved.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ... and this is what I was dreaming of. We keep snapshots of VMs 
>>>>>>>>>>>>>> to be able to roll them back.
>>>>>>>>>>>>>> We'd like to also keep those snapshots in a separate Ceph 
>>>>>>>>>>>>>> instance as an additional safety-net (in addition to an offline 
>>>>>>>>>>>>>> backup of those snapshots with Benji backup).
>>>>>>>>>>>>>> It is not (yet) clear to me whether we can pay the "2 x" price 
>>>>>>>>>>>>>> for journaling in the long run, so this would be the way to go 
>>>>>>>>>>>>>> in case we can't.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Since I got you personally, I have two bonus questions.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1) Your talk:
>>>>>>>>>>>>>>>>         
>>>>>>>>>>>>>>>> https://events.static.linuxfound.org/sites/events/files/slides/Disaster%20Recovery%20and%20Ceph%20Block%20Storage-%20Introducing%20Multi-Site%20Mirroring.pdf
>>>>>>>>>>>>>>>>         mentions "rbd journal object flush age", which I'd 
>>>>>>>>>>>>>>>> translate with something like the "commit" mount option on a 
>>>>>>>>>>>>>>>> classical file system - correct?
>>>>>>>>>>>>>>>>         I don't find this switch documented anywhere, though - 
>>>>>>>>>>>>>>>> is there experience with it / what's the default?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It's a low-level knob that by default causes the journal to 
>>>>>>>>>>>>>>> flush its
>>>>>>>>>>>>>>> pending IO events before it allows the corresponding IO to be 
>>>>>>>>>>>>>>> issued
>>>>>>>>>>>>>>> against the backing image. Setting it to a value greater that 
>>>>>>>>>>>>>>> zero
>>>>>>>>>>>>>>> will allow that many seconds of IO events to be batched 
>>>>>>>>>>>>>>> together in a
>>>>>>>>>>>>>>> journal append operation and its helpful for high-throughout, 
>>>>>>>>>>>>>>> small IO
>>>>>>>>>>>>>>> operations. Of course it turned out that a bug had broken that 
>>>>>>>>>>>>>>> option
>>>>>>>>>>>>>>> a while where events would never batch, so a fix is currently
>>>>>>>>>>>>>>> scheduled for backport of all active releases [1] w/ the goal 
>>>>>>>>>>>>>>> that no
>>>>>>>>>>>>>>> one should need to tweak it.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> That's even better - since our setup is growing and we will keep 
>>>>>>>>>>>>>> upgrading, I'll then just keep things as they are now (no manual 
>>>>>>>>>>>>>> tweaking)
>>>>>>>>>>>>>> and tag along the development. Thanks!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2) I read I can run more than one rbd-mirror with 
>>>>>>>>>>>>>>>> Mimic/Nautilus. Do they load-balance the images, or "only" 
>>>>>>>>>>>>>>>> failover in case one of them dies?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Starting with Nautilus, the default configuration for 
>>>>>>>>>>>>>>> rbd-mirror is to
>>>>>>>>>>>>>>> evenly divide the number of mirrored images between all running
>>>>>>>>>>>>>>> daemons. This does not split the total load since some images 
>>>>>>>>>>>>>>> might be
>>>>>>>>>>>>>>> hotter than others, but it at least spreads the load.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> That's fine enough for our use case. Spreading by "hotness" is a 
>>>>>>>>>>>>>> task without a clear answer
>>>>>>>>>>>>>> and "temperature" may change quickly, so that's all I hoped for.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Many thanks again for the very helpful explanations!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>           Oliver
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Cheers and many thanks for the quick and perfect help!
>>>>>>>>>>>>>>>>              Oliver
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Cheers and thanks in advance,
>>>>>>>>>>>>>>>>>>              Oliver
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>> ceph-users mailing list
>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> [1] 
>>>>>>>>>>>>>>>>> https://docs.ceph.com/docs/master/rbd/rbd-mirroring/#rbd-mirror-daemon
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Jason
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [1] https://github.com/ceph/ceph/pull/28539
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> [1] https://tracker.ceph.com/issues/41780
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jason
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jason
>>>>>
>>>>
>>>>
>>>
>>>
>>
> 
> [1] https://tracker.ceph.com/issues/41833
>

smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph RBD Mirroring

Reply via email to