On Fri, Sep 13, 2019 at 10:41 AM Oliver Freyermuth
<[email protected]> wrote:
>
> Am 13.09.19 um 16:30 schrieb Jason Dillaman:
> > On Fri, Sep 13, 2019 at 10:17 AM Jason Dillaman <[email protected]> wrote:
> >>
> >> On Fri, Sep 13, 2019 at 10:02 AM Oliver Freyermuth
> >> <[email protected]> wrote:
> >>>
> >>> Dear Jason,
> >>>
> >>> thanks for the very detailed explanation! This was very instructive.
> >>> Sadly, the watchers look correct - see details inline.
> >>>
> >>> Am 13.09.19 um 15:02 schrieb Jason Dillaman:
> >>>> On Thu, Sep 12, 2019 at 9:55 PM Oliver Freyermuth
> >>>> <[email protected]> wrote:
> >>>>>
> >>>>> Dear Jason,
> >>>>>
> >>>>> thanks for taking care and developing a patch so quickly!
> >>>>>
> >>>>> I have another strange observation to share. In our test setup, only a
> >>>>> single RBD mirroring daemon is running for 51 images.
> >>>>> It works fine with a constant stream of 1-2 MB/s, but at some point
> >>>>> after roughly 20 hours, _all_ images go to this interesting state:
> >>>>> -----------------------------------------
> >>>>> # rbd mirror image status test-vm.XXXXX-disk2
> >>>>> test-vm.XXXXX-disk2:
> >>>>> global_id: XXXXXXXXXXXXXXX
> >>>>> state: down+replaying
> >>>>> description: replaying, master_position=[object_number=14,
> >>>>> tag_tid=6, entry_tid=6338], mirror_position=[object_number=14,
> >>>>> tag_tid=6, entry_tid=6338], entries_behind_master=0
> >>>>> last_update: 2019-09-13 03:45:43
> >>>>> -----------------------------------------
> >>>>> Running this command several times, I see entry_tid increasing at both
> >>>>> ends, so mirroring seems to be working just fine.
> >>>>>
> >>>>> However:
> >>>>> -----------------------------------------
> >>>>> # rbd mirror pool status
> >>>>> health: WARNING
> >>>>> images: 51 total
> >>>>> 51 unknown
> >>>>> -----------------------------------------
> >>>>> The health warning is not visible in the dashboard (also not in the
> >>>>> mirroring menu), the daemon still seems to be running, dropped nothing
> >>>>> in the logs,
> >>>>> and claims to be "ok" in the dashboard - it's only that all images show
> >>>>> up in unknown state even though all seems to be working fine.
> >>>>>
> >>>>> Any idea on how to debug this?
> >>>>> When I restart the rbd-mirror service, all images come back as green. I
> >>>>> already encountered this twice in 3 days.
> >>>>
> >>>> The dashboard relies on the rbd-mirror daemon to provide it errors and
> >>>> warnings. You can see the status reported by rbd-mirror by running
> >>>> "ceph service status":
> >>>>
> >>>> $ ceph service status
> >>>> {
> >>>> "rbd-mirror": {
> >>>> "4152": {
> >>>> "status_stamp": "2019-09-13T08:58:41.937491-0400",
> >>>> "last_beacon": "2019-09-13T08:58:41.937491-0400",
> >>>> "status": {
> >>>> "json":
> >>>> "{\"1\":{\"name\":\"mirror\",\"callouts\":{},\"image_assigned_count\":1,\"image_error_count\":0,\"image_local_count\":1,\"image_remote_count\":1,\"image_warning_count\":0,\"instance_id\":\"4154\",\"leader\":true},\"2\":{\"name\":\"mirror_parent\",\"callouts\":{},\"image_assigned_count\":0,\"image_error_count\":0,\"image_local_count\":0,\"image_remote_count\":0,\"image_warning_count\":0,\"instance_id\":\"4156\",\"leader\":true}}"
> >>>> }
> >>>> }
> >>>> }
> >>>> }
> >>>>
> >>>> In your case, most likely it seems like rbd-mirror thinks all is good
> >>>> with the world so it's not reporting any errors.
> >>>
> >>> This is indeed the case:
> >>>
> >>> # ceph service status
> >>> {
> >>> "rbd-mirror": {
> >>> "84243": {
> >>> "status_stamp": "2019-09-13 15:40:01.149815",
> >>> "last_beacon": "2019-09-13 15:40:26.151381",
> >>> "status": {
> >>> "json":
> >>> "{\"2\":{\"name\":\"rbd\",\"callouts\":{},\"image_assigned_count\":51,\"image_error_count\":0,\"image_local_count\":51,\"image_remote_count\":51,\"image_warning_count\":0,\"instance_id\":\"84247\",\"leader\":true}}"
> >>> }
> >>> }
> >>> },
> >>> "rgw": {
> >>> ...
> >>> }
> >>> }
> >>>
> >>>> The "down" state indicates that the rbd-mirror daemon isn't correctly
> >>>> watching the "rbd_mirroring" object in the pool. You can see who it
> >>>> watching that object by running the "rados" "listwatchers" command:
> >>>>
> >>>> $ rados -p <pool name> listwatchers rbd_mirroring
> >>>> watcher=1.2.3.4:0/199388543 client.4154 cookie=94769010788992
> >>>> watcher=1.2.3.4:0/199388543 client.4154 cookie=94769061031424
> >>>>
> >>>> In my case, the "4154" from "client.4154" is the unique global id for
> >>>> my connection to the cluster, which relates back to the "ceph service
> >>>> status" dump which also shows status by daemon using the unique global
> >>>> id.
> >>>
> >>> Sadly(?), this looks as expected:
> >>>
> >>> # rados -p rbd listwatchers rbd_mirroring
> >>> watcher=10.160.19.240:0/2922488671 client.84247 cookie=139770046978672
> >>> watcher=10.160.19.240:0/2922488671 client.84247 cookie=139771389162560
> >>
> >> Hmm, the unique id is different (84243 vs 84247). I wouldn't have
> >> expected the global id to have changed. Did you restart the Ceph
> >> cluster or MONs? Do you see any "peer assigned me a different
> >> global_id" errors in your rbd-mirror logs?
> >>
> >> I'll open a tracker ticket to fix the "ceph service status", though,
> >> since clearly your global id changed but it wasn't noticed by the
> >> service daemon status updater.
> >
> > ... also, can you please provide the output from the following via a
> > pastebin link?
> >
> > # rados -p rbd listomapvals rbd_mirroring
>
> Of course, here you go:
> https://0x0.st/zy8J.txt
Thanks. For the case above of global image id
1a53fafa-37ef-4edf-9633-c2ba3323ed93, the on-disk status shows that it
was last updated by client.84247 / nonce 2922488671, which correctly
matches your watcher so the status should be "up":
status_global_1a53fafa-37ef-4edf-9633-c2ba3323ed93
value (232 bytes) :
00000000 01 01 2c 00 00 00 08 17 49 01 00 00 00 00 00 01
|..,.....I.......| <--- "17 49 01 00 00 00 00 00" (84247) is the
instance id
00000010 01 01 1c 00 00 00 03 00 00 00 5f a3 31 ae 10 00
|.........._.1...| <--- "5f a3 31 ae" is the nonce (2922488671)
00000020 00 00 02 00 00 00 0a a0 13 f0 00 00 00 00 00 00
|................| <--- "0a a0 13 f0" is the IP address
(10.160.9.240)
00000030 00 00 01 01 b0 00 00 00 04 a2 00 00 00 72 65 70 |.............rep|
00000040 6c 61 79 69 6e 67 2c 20 6d 61 73 74 65 72 5f 70 |laying, master_p|
00000050 6f 73 69 74 69 6f 6e 3d 5b 6f 62 6a 65 63 74 5f |osition=[object_|
00000060 6e 75 6d 62 65 72 3d 31 39 2c 20 74 61 67 5f 74 |number=19, tag_t|
00000070 69 64 3d 36 2c 20 65 6e 74 72 79 5f 74 69 64 3d |id=6, entry_tid=|
00000080 32 36 34 34 33 5d 2c 20 6d 69 72 72 6f 72 5f 70 |26443], mirror_p|
00000090 6f 73 69 74 69 6f 6e 3d 5b 6f 62 6a 65 63 74 5f |osition=[object_|
000000a0 6e 75 6d 62 65 72 3d 31 39 2c 20 74 61 67 5f 74 |number=19, tag_t|
000000b0 69 64 3d 36 2c 20 65 6e 74 72 79 5f 74 69 64 3d |id=6, entry_tid=|
000000c0 32 36 34 34 33 5d 2c 20 65 6e 74 72 69 65 73 5f |26443], entries_|
000000d0 62 65 68 69 6e 64 5f 6d 61 73 74 65 72 3d 30 51 |behind_master=0Q|
000000e0 aa 7b 5d 1b 5f 4f 33 00 |.{]._O3.|
000000e8
The only thing I can think of is that somehow the watcher entity
instance has a different encoding and its failing a comparison. Can
you restart rbd-mirror such that the statuses list "up+replaying" and
then run the following?
# rados -p rbd getomapval rbd_mirroring
status_global_1a53fafa-37ef-4edf-9633-c2ba3323ed93
> Cheers,
> Oliver
>
> >
> >>> However, the dashboard still shows those images in "unknown", and this
> >>> also shows up via command line:
> >>>
> >>> # rbd mirror pool status
> >>> health: WARNING
> >>> images: 51 total
> >>> 51 unknown
> >>> # rbd mirror image status test-vm.physik.uni-bonn.de-disk1
> >>> test-vm.physik.uni-bonn.de-disk2:
> >>> global_id: 1a53fafa-37ef-4edf-9633-c2ba3323ed93
> >>> state: down+replaying
> >>> description: replaying, master_position=[object_number=18, tag_tid=6,
> >>> entry_tid=25202], mirror_position=[object_number=18, tag_tid=6,
> >>> entry_tid=25202], entries_behind_master=0
> >>> last_update: 2019-09-13 15:55:15
> >>>
> >>> Any ideas on what else could cause this?
> >>>
> >>> Cheers and thanks,
> >>> Oliver
> >>>
> >>>>
> >>>>> Any idea on this (or how I can extract more information)?
> >>>>> I fear keeping high-level debug logs active for ~24h is not feasible.
> >>>>>
> >>>>> Cheers,
> >>>>> Oliver
> >>>>>
> >>>>>
> >>>>> On 2019-09-11 19:14, Jason Dillaman wrote:
> >>>>>> On Wed, Sep 11, 2019 at 12:57 PM Oliver Freyermuth
> >>>>>> <[email protected]> wrote:
> >>>>>>>
> >>>>>>> Dear Jason,
> >>>>>>>
> >>>>>>> I played a bit more with rbd mirroring and learned that deleting an
> >>>>>>> image at the source (or disabling journaling on it) immediately moves
> >>>>>>> the image to trash at the target -
> >>>>>>> but setting rbd_mirroring_delete_delay helps to have some more grace
> >>>>>>> time to catch human mistakes.
> >>>>>>>
> >>>>>>> However, I have issues restoring such an image which has been moved
> >>>>>>> to trash by the RBD-mirror daemon as user:
> >>>>>>> -----------------------------------
> >>>>>>> [root@mon001 ~]# rbd trash ls -la
> >>>>>>> ID NAME SOURCE DELETED_AT
> >>>>>>> STATUS PARENT
> >>>>>>> d4fbe8f63905 test-vm-XXXXXXXXXXXXXXXXXX-disk2 MIRRORING Wed Sep 11
> >>>>>>> 18:43:14 2019 protected until Thu Sep 12 18:43:14 2019
> >>>>>>> [root@mon001 ~]# rbd trash restore --image foo-image d4fbe8f63905
> >>>>>>> rbd: restore error: 2019-09-11 18:50:15.387 7f5fa9590b00 -1
> >>>>>>> librbd::api::Trash: restore: Current trash source: mirroring does not
> >>>>>>> match expected: user
> >>>>>>> (22) Invalid argument
> >>>>>>> -----------------------------------
> >>>>>>> This is issued on the mon, which has the client.admin key, so it
> >>>>>>> should not be a permission issue.
> >>>>>>> It also fails when I try that in the Dashboard.
> >>>>>>>
> >>>>>>> Sadly, the error message is not clear enough for me to figure out
> >>>>>>> what could be the problem - do you see what I did wrong?
> >>>>>>
> >>>>>> Good catch, it looks like we accidentally broke this in Nautilus when
> >>>>>> image live-migration support was added. I've opened a new tracker
> >>>>>> ticket to fix this [1].
> >>>>>>
> >>>>>>> Cheers and thanks again,
> >>>>>>> Oliver
> >>>>>>>
> >>>>>>> On 2019-09-10 23:17, Oliver Freyermuth wrote:
> >>>>>>>> Dear Jason,
> >>>>>>>>
> >>>>>>>> On 2019-09-10 23:04, Jason Dillaman wrote:
> >>>>>>>>> On Tue, Sep 10, 2019 at 2:08 PM Oliver Freyermuth
> >>>>>>>>> <[email protected]> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Dear Jason,
> >>>>>>>>>>
> >>>>>>>>>> On 2019-09-10 18:50, Jason Dillaman wrote:
> >>>>>>>>>>> On Tue, Sep 10, 2019 at 12:25 PM Oliver Freyermuth
> >>>>>>>>>>> <[email protected]> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Dear Cephalopodians,
> >>>>>>>>>>>>
> >>>>>>>>>>>> I have two questions about RBD mirroring.
> >>>>>>>>>>>>
> >>>>>>>>>>>> 1) I can not get it to work - my setup is:
> >>>>>>>>>>>>
> >>>>>>>>>>>> - One cluster holding the live RBD volumes and snapshots,
> >>>>>>>>>>>> in pool "rbd", cluster name "ceph",
> >>>>>>>>>>>> running latest Mimic.
> >>>>>>>>>>>> I ran "rbd mirror pool enable rbd pool" on that cluster
> >>>>>>>>>>>> and created a cephx user "rbd_mirror" with (is there a better
> >>>>>>>>>>>> way?):
> >>>>>>>>>>>> ceph auth get-or-create client.rbd_mirror mon 'allow r'
> >>>>>>>>>>>> osd 'allow class-read object_prefix rbd_children, allow pool rbd
> >>>>>>>>>>>> r' -o ceph.client.rbd_mirror.keyring --cluster ceph
> >>>>>>>>>>>> In that pool, two images have the journaling feature
> >>>>>>>>>>>> activated, all others have it disabled still (so I would expect
> >>>>>>>>>>>> these two to be mirrored).
> >>>>>>>>>>>
> >>>>>>>>>>> You can just use "mon 'profile rbd' osd 'profile rbd'" for the
> >>>>>>>>>>> caps --
> >>>>>>>>>>> but you definitely need more than read-only permissions to the
> >>>>>>>>>>> remote
> >>>>>>>>>>> cluster since it needs to be able to create snapshots of remote
> >>>>>>>>>>> images
> >>>>>>>>>>> and update/trim the image journals.
> >>>>>>>>>>
> >>>>>>>>>> these profiles really make life a lot easier. I should have
> >>>>>>>>>> thought of them rather than "guessing" a potentially good
> >>>>>>>>>> configuration...
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> - Another (empty) cluster running latest Nautilus,
> >>>>>>>>>>>> cluster name "ceph", pool "rbd".
> >>>>>>>>>>>> I've used the dashboard to activate mirroring for the
> >>>>>>>>>>>> RBD pool, and then added a peer with cluster name "ceph-virt",
> >>>>>>>>>>>> cephx-ID "rbd_mirror", filled in the mons and key created above.
> >>>>>>>>>>>> I've then run:
> >>>>>>>>>>>> ceph auth get-or-create client.rbd_mirror_backup mon
> >>>>>>>>>>>> 'allow r' osd 'allow class-read object_prefix rbd_children,
> >>>>>>>>>>>> allow pool rbd rwx' -o client.rbd_mirror_backup.keyring
> >>>>>>>>>>>> --cluster ceph
> >>>>>>>>>>>> and deployed that key on the rbd-mirror machine, and
> >>>>>>>>>>>> started the service with:
> >>>>>>>>>>>
> >>>>>>>>>>> Please use "mon 'profile rbd-mirror' osd 'profile rbd'" for your
> >>>>>>>>>>> caps [1].
> >>>>>>>>>>
> >>>>>>>>>> That did the trick (in combination with the above)!
> >>>>>>>>>> Again a case of PEBKAC: I should have read the documentation until
> >>>>>>>>>> the end, clearly my fault.
> >>>>>>>>>>
> >>>>>>>>>> It works well now, even though it seems to run a bit slow (~35
> >>>>>>>>>> MB/s for the initial sync when everything is 1 GBit/s),
> >>>>>>>>>> but that may also be caused by combination of some very limited
> >>>>>>>>>> hardware on the receiving end (which will be scaled up in the
> >>>>>>>>>> future).
> >>>>>>>>>> A single host with 6 disks, replica 3 and a RAID controller which
> >>>>>>>>>> can only do RAID0 and not JBOD is certainly not ideal, so commit
> >>>>>>>>>> latency may cause this slow bandwidth.
> >>>>>>>>>
> >>>>>>>>> You could try increasing "rbd_concurrent_management_ops" from the
> >>>>>>>>> default of 10 ops to something higher to attempt to account for the
> >>>>>>>>> latency. However, I wouldn't expect near-line speed w/ RBD
> >>>>>>>>> mirroring.
> >>>>>>>>
> >>>>>>>> Thanks - I will play with this option once we have more storage
> >>>>>>>> available in the target pool ;-).
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> systemctl start
> >>>>>>>>>>>> ceph-rbd-mirror@rbd_mirror_backup.service
> >>>>>>>>>>>>
> >>>>>>>>>>>> After this, everything looks fine:
> >>>>>>>>>>>> # rbd mirror pool info
> >>>>>>>>>>>> Mode: pool
> >>>>>>>>>>>> Peers:
> >>>>>>>>>>>> UUID NAME CLIENT
> >>>>>>>>>>>> XXXXXXXXXXX ceph-virt
> >>>>>>>>>>>> client.rbd_mirror
> >>>>>>>>>>>>
> >>>>>>>>>>>> The service also seems to start fine, but logs show (debug
> >>>>>>>>>>>> rbd_mirror=20):
> >>>>>>>>>>>>
> >>>>>>>>>>>> rbd::mirror::ClusterWatcher:0x5575e2a7d390
> >>>>>>>>>>>> resolve_peer_config_keys: retrieving config-key: pool_id=2,
> >>>>>>>>>>>> pool_name=rbd, peer_uuid=XXXXXXXXXXX
> >>>>>>>>>>>> rbd::mirror::Mirror: 0x5575e29c7240 update_pool_replayers:
> >>>>>>>>>>>> enter
> >>>>>>>>>>>> rbd::mirror::Mirror: 0x5575e29c7240 update_pool_replayers:
> >>>>>>>>>>>> restarting failed pool replayer for uuid: XXXXXXXXXXX cluster:
> >>>>>>>>>>>> ceph-virt client: client.rbd_mirror
> >>>>>>>>>>>> rbd::mirror::PoolReplayer: 0x5575e2a7da20 init: replaying
> >>>>>>>>>>>> for uuid: XXXXXXXXXXX cluster: ceph-virt client:
> >>>>>>>>>>>> client.rbd_mirror
> >>>>>>>>>>>> rbd::mirror::PoolReplayer: 0x5575e2a7da20 init_rados:
> >>>>>>>>>>>> error connecting to remote peer uuid: XXXXXXXXXXX cluster:
> >>>>>>>>>>>> ceph-virt client: client.rbd_mirror: (95) Operation not supported
> >>>>>>>>>>>> rbd::mirror::ServiceDaemon: 0x5575e29c8d70
> >>>>>>>>>>>> add_or_update_callout: pool_id=2, callout_id=2,
> >>>>>>>>>>>> callout_level=error, text=unable to connect to remote cluster
> >>>>>>>>>>>
> >>>>>>>>>>> If it's still broken after fixing your caps above, perhaps
> >>>>>>>>>>> increase
> >>>>>>>>>>> debugging for "rados", "monc", "auth", and "ms" to see if you can
> >>>>>>>>>>> determine the source of the op not supported error.
> >>>>>>>>>>>
> >>>>>>>>>>>> I already tried storing the ceph.client.rbd_mirror.keyring (i.e.
> >>>>>>>>>>>> from the cluster with the live images) on the rbd-mirror machine
> >>>>>>>>>>>> explicitly (i.e. not only in mon config storage),
> >>>>>>>>>>>> and after doing that:
> >>>>>>>>>>>> rbd -m mon_ip_of_ceph_virt_cluster --id=rbd_mirror ls
> >>>>>>>>>>>> works fine. So it's not a connectivity issue. Maybe a permission
> >>>>>>>>>>>> issue? Or did I miss something?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Any idea what "operation not supported" means?
> >>>>>>>>>>>> It's unclear to me whether things should work well using Mimic
> >>>>>>>>>>>> with Nautilus, and enabling pool mirroring but only having
> >>>>>>>>>>>> journaling on for two images is a supported case.
> >>>>>>>>>>>
> >>>>>>>>>>> Yes and yes.
> >>>>>>>>>>>
> >>>>>>>>>>>> 2) Since there is a performance drawback (about 2x) for
> >>>>>>>>>>>> journaling, is it also possible to only mirror snapshots, and
> >>>>>>>>>>>> leave the live volumes alone?
> >>>>>>>>>>>> This would cover the common backup usecase before
> >>>>>>>>>>>> deferred mirroring is implemented (or is it there already?).
> >>>>>>>>>>>
> >>>>>>>>>>> This is in-development right now and will hopefully land for the
> >>>>>>>>>>> Octopus release.
> >>>>>>>>>>
> >>>>>>>>>> That would be very cool. Just to clarify: You mean the "real"
> >>>>>>>>>> deferred mirroring, not a "snapshot only" mirroring?
> >>>>>>>>>> Is it already clear if this will require Octopous (or a later
> >>>>>>>>>> release) on both ends, or only on the receiving side?
> >>>>>>>>>
> >>>>>>>>> I might not be sure what you mean by deferred mirroring. You can
> >>>>>>>>> delay
> >>>>>>>>> the replay of the journal via the "rbd_mirroring_replay_delay"
> >>>>>>>>> configuration option so that your DR site can be X seconds behind
> >>>>>>>>> the
> >>>>>>>>> primary at a minimum.
> >>>>>>>>
> >>>>>>>> This is indeed what I was thinking of...
> >>>>>>>>
> >>>>>>>>> For Octopus we are working on on-demand and
> >>>>>>>>> scheduled snapshot mirroring between sites -- no journal is
> >>>>>>>>> involved.
> >>>>>>>>
> >>>>>>>> ... and this is what I was dreaming of. We keep snapshots of VMs to
> >>>>>>>> be able to roll them back.
> >>>>>>>> We'd like to also keep those snapshots in a separate Ceph instance
> >>>>>>>> as an additional safety-net (in addition to an offline backup of
> >>>>>>>> those snapshots with Benji backup).
> >>>>>>>> It is not (yet) clear to me whether we can pay the "2 x" price for
> >>>>>>>> journaling in the long run, so this would be the way to go in case
> >>>>>>>> we can't.
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> Since I got you personally, I have two bonus questions.
> >>>>>>>>>>
> >>>>>>>>>> 1) Your talk:
> >>>>>>>>>>
> >>>>>>>>>> https://events.static.linuxfound.org/sites/events/files/slides/Disaster%20Recovery%20and%20Ceph%20Block%20Storage-%20Introducing%20Multi-Site%20Mirroring.pdf
> >>>>>>>>>> mentions "rbd journal object flush age", which I'd translate
> >>>>>>>>>> with something like the "commit" mount option on a classical file
> >>>>>>>>>> system - correct?
> >>>>>>>>>> I don't find this switch documented anywhere, though - is
> >>>>>>>>>> there experience with it / what's the default?
> >>>>>>>>>
> >>>>>>>>> It's a low-level knob that by default causes the journal to flush
> >>>>>>>>> its
> >>>>>>>>> pending IO events before it allows the corresponding IO to be issued
> >>>>>>>>> against the backing image. Setting it to a value greater that zero
> >>>>>>>>> will allow that many seconds of IO events to be batched together in
> >>>>>>>>> a
> >>>>>>>>> journal append operation and its helpful for high-throughout, small
> >>>>>>>>> IO
> >>>>>>>>> operations. Of course it turned out that a bug had broken that
> >>>>>>>>> option
> >>>>>>>>> a while where events would never batch, so a fix is currently
> >>>>>>>>> scheduled for backport of all active releases [1] w/ the goal that
> >>>>>>>>> no
> >>>>>>>>> one should need to tweak it.
> >>>>>>>>
> >>>>>>>> That's even better - since our setup is growing and we will keep
> >>>>>>>> upgrading, I'll then just keep things as they are now (no manual
> >>>>>>>> tweaking)
> >>>>>>>> and tag along the development. Thanks!
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> 2) I read I can run more than one rbd-mirror with Mimic/Nautilus.
> >>>>>>>>>> Do they load-balance the images, or "only" failover in case one of
> >>>>>>>>>> them dies?
> >>>>>>>>>
> >>>>>>>>> Starting with Nautilus, the default configuration for rbd-mirror is
> >>>>>>>>> to
> >>>>>>>>> evenly divide the number of mirrored images between all running
> >>>>>>>>> daemons. This does not split the total load since some images might
> >>>>>>>>> be
> >>>>>>>>> hotter than others, but it at least spreads the load.
> >>>>>>>>
> >>>>>>>> That's fine enough for our use case. Spreading by "hotness" is a
> >>>>>>>> task without a clear answer
> >>>>>>>> and "temperature" may change quickly, so that's all I hoped for.
> >>>>>>>>
> >>>>>>>> Many thanks again for the very helpful explanations!
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>> Oliver
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Cheers and many thanks for the quick and perfect help!
> >>>>>>>>>> Oliver
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> Cheers and thanks in advance,
> >>>>>>>>>>>> Oliver
> >>>>>>>>>>>>
> >>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>> ceph-users mailing list
> >>>>>>>>>>>> [email protected]
> >>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>>>>>>>>
> >>>>>>>>>>> [1]
> >>>>>>>>>>> https://docs.ceph.com/docs/master/rbd/rbd-mirroring/#rbd-mirror-daemon
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Jason
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> [1] https://github.com/ceph/ceph/pull/28539
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>> [1] https://tracker.ceph.com/issues/41780
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>>
> >>
> >>
> >> --
> >> Jason
> >
> >
> >
>
>
--
Jason
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com