Den tis 9 apr. 2019 kl 17:48 skrev Jason Dillaman <jdill...@redhat.com>:
> Any chance your rbd-mirror daemon has the admin sockets available > (defaults to /var/run/ceph/cephdr-client.<id>.<pid>.<random>.asok)? If > so, you can run "ceph --admin-daemon /path/to/asok rbd mirror status". > { "pool_replayers": [ { "pool": "glance", "peer": "uuid: df30fb21-d1de-4c3a-9c00-10eaa4b30e00 cluster: production client: client.productionbackup", "instance_id": "869081", "leader_instance_id": "869081", "leader": true, "instances": [], "local_cluster_admin_socket": "/var/run/ceph/client.backup.1936211.backup.94225674131712.asok", "remote_cluster_admin_socket": "/var/run/ceph/client.productionbackup.1936211.production.94225675210000.asok", "sync_throttler": { "max_parallel_syncs": 5, "running_syncs": 0, "waiting_syncs": 0 }, "image_replayers": [ { "name": "glance/ea5e4ad2-090a-4665-b142-5c7a095963e0", "state": "Replaying" }, { "name": "glance/d7095183-45ef-40b5-80ef-f7c9d3bb1e62", "state": "Replaying" }, -------------------cut---------- { "name": "cinder/volume-bcb41f46-3716-4ee2-aa19-6fbc241fbf05", "state": "Replaying" } ] }, { "pool": "nova", "peer": "uuid: 1fc7fefc-9bcb-4f36-a259-66c3d8086702 cluster: production client: client.productionbackup", "instance_id": "889074", "leader_instance_id": "889074", "leader": true, "instances": [], "local_cluster_admin_socket": "/var/run/ceph/client.backup.1936211.backup.94225678548048.asok", "remote_cluster_admin_socket": "/var/run/ceph/client.productionbackup.1936211.production.94225679621728.asok", "sync_throttler": { "max_parallel_syncs": 5, "running_syncs": 0, "waiting_syncs": 0 }, "image_replayers": [] } ], "image_deleter": { "image_deleter_status": { "delete_images_queue": [ { "local_pool_id": 3, "global_image_id": "ff531159-de6f-4324-a022-50c079dedd45" } ], "failed_deletes_queue": [] } > > On Tue, Apr 9, 2019 at 11:26 AM Magnus Grönlund <mag...@gronlund.se> > wrote: > > > > > > > > Den tis 9 apr. 2019 kl 17:14 skrev Jason Dillaman <jdill...@redhat.com>: > >> > >> On Tue, Apr 9, 2019 at 11:08 AM Magnus Grönlund <mag...@gronlund.se> > wrote: > >> > > >> > >On Tue, Apr 9, 2019 at 10:40 AM Magnus Grönlund <mag...@gronlund.se> > wrote: > >> > >> > >> > >> Hi, > >> > >> We have configured one-way replication of pools between a > production cluster and a backup cluster. But unfortunately the rbd-mirror > or the backup cluster is unable to keep up with the production cluster so > the replication fails to reach replaying state. > >> > > > >> > >Hmm, it's odd that they don't at least reach the replaying state. Are > >> > >they still performing the initial sync? > >> > > >> > There are three pools we try to mirror, (glance, cinder, and nova, no > points for guessing what the cluster is used for :) ), > >> > the glance and cinder pools are smaller and sees limited write > activity, and the mirroring works, the nova pool which is the largest and > has 90% of the write activity never leaves the "unknown" state. > >> > > >> > # rbd mirror pool status cinder > >> > health: OK > >> > images: 892 total > >> > 890 replaying > >> > 2 stopped > >> > # > >> > # rbd mirror pool status nova > >> > health: WARNING > >> > images: 2479 total > >> > 2479 unknown > >> > # > >> > The production clsuter has 5k writes/s on average and the backup > cluster has 1-2k writes/s on average. The production cluster is bigger and > has better specs. I thought that the backup cluster would be able to keep > up but it looks like I was wrong. > >> > >> The fact that they are in the unknown state just means that the remote > >> "rbd-mirror" daemon hasn't started any journal replayers against the > >> images. If it couldn't keep up, it would still report a status of > >> "up+replaying". What Ceph release are you running on your backup > >> cluster? > >> > > The backup cluster is running Luminous 12.2.11 (the production cluster > 12.2.10) > > > >> > >> > >> And the journals on the rbd volumes keep growing... > >> > >> > >> > >> Is it enough to simply disable the mirroring of the pool (rbd > mirror pool disable <pool>) and that will remove the lagging reader from > the journals and shrink them, or is there anything else that has to be done? > >> > > > >> > >You can either disable the journaling feature on the image(s) since > >> > >there is no point to leave it on if you aren't using mirroring, or > run > >> > >"rbd mirror pool disable <pool>" to purge the journals. > >> > > >> > Thanks for the confirmation. > >> > I will stop the mirror of the nova pool and try to figure out if > there is anything we can do to get the backup cluster to keep up. > >> > > >> > >> Best regards > >> > >> /Magnus > >> > >> _______________________________________________ > >> > >> ceph-users mailing list > >> > >> ceph-users@lists.ceph.com > >> > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > > > >> > >-- > >> > >Jason > >> > >> > >> > >> -- > >> Jason > > > > -- > Jason >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com