Hi Eugen,
Thanks for the suggestion. I've repeated my attempt with the wpq scheduler (I
ran "ceph config set osd osd_op_queue wpq" and restarted all the OSDs).
That still seems to be either slow or stuck in a draining state - 10 mins
elapsed draining for just a few MB of data.
$ ceph orch osd rm status ; date
OSD HOST STATE PGS REPLACE FORCE ZAP DRAIN STARTED AT
2 raynor-sc-2 draining 117 False False True 2024-10-21
13:48:52.559054
Mon Oct 21 13:59:33 UTC 2024
$ ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL
%USE VAR PGS STATUS
0 hdd 0.01459 1.00000 15 GiB 64 MiB 22 MiB 0 B 42 MiB 15 GiB
0.42 1.14 117 up
2 hdd 0 1.00000 15 GiB 52 MiB 22 MiB 2 KiB 30 MiB 15 GiB
0.34 0.93 117 up
3 hdd 0.01459 1.00000 15 GiB 52 MiB 21 MiB 7 KiB 32 MiB 15 GiB
0.34 0.93 117 up
TOTAL 45 GiB 169 MiB 66 MiB 9.5 KiB 104 MiB 45 GiB
0.37
MIN/MAX VAR: 0.93/1.14 STDDEV: 0.04
$ ceph -s
cluster:
id: e773d9c2-6d8d-4413-8e8f-e38f248f5959
health: HEALTH_OK
services:
mon: 2 daemons, quorum raynor-sc-1,raynor-sc-3 (age 7m)
mgr: raynor-sc-1.hjpano(active, since 10m), standbys: raynor-sc-3.grmovv
mds: 1/1 daemons up, 1 standby
osd: 3 osds: 3 up (since 10m), 3 in (since 74m); 117 remapped pgs
rgw: 2 daemons active (2 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 9 pools, 117 pgs
objects: 250 objects, 479 KiB
usage: 181 MiB used, 45 GiB / 45 GiB avail
pgs: 250/750 objects misplaced (33.333%)
117 active+clean+remapped
Interesting that the cluster thinks 33% of PGs are misplaced, that seems to me
to imply that it's stuck rather than slow. I wonder if it's actually possible
to drop below 3 OSDs in this manner?
Thanks,
Alex
________________________________
From: Eugen Block <[email protected]>
Sent: Monday, October 21, 2024 2:20 PM
To: [email protected] <[email protected]>
Subject: [EXTERNAL] [ceph-users] Re: How to Speed Up Draining OSDs?
Hi,
for a production cluster I'd recommend sticking to wpq at the moment,
where you can apply "legacy" recovery settings. If you're willing to
help the Devs figuring out how to get to the bottom of this, I'm sure
they would highly appreciate it. But I currently know too little about
mclock to know the right knobs. So far I've only tried it with only a
few different settings and none helped significantly.
I would expect that there are existing tracker issues since this topic
comes up every other week or so. If not, I'd suggest to create one.
Thanks,
Eugen
Zitat von "Alex Hussein-Kershaw (HE/HIM)" <[email protected]>:
> Hi Folks,
>
> I'm trying to scale-in a Ceph Cluster. It's running 19.2.0 and is
> cephadm managed. It's just a test system, so has basically no data
> and only has 3 OSDs.
>
> As part of the scaling-in, I run "ceph orch host drain <hostname>
> --zap-osd-devices" as per Host Management — Ceph
> Documentation<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.ceph.com%2Fen%2Freef%2Fcephadm%2Fhost-management%2F%23removing-hosts&data=05%7C02%7Calexhus%40microsoft.com%7C3d103f3b2d6c4d46010008dcf1d330cf%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638651136608203663%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=KEimMET46DYq5W6RPRH56ktZ%2BiPRoSQONmQqdTX96h8%3D&reserved=0<https://docs.ceph.com/en/reef/cephadm/host-management/#removing-hosts>>.
> That starts off the OSD
> draining.
>
> However, that drain seems to take an enormous amount of time. My OSD
> has less than 100MiB raw storage, and I let it run for 2 hours over
> lunch and it still was not finished, so I cancelled it.
>
> I'm not sure how this scales, but I'm assuming at least linearly
> with data stored, which seems like bad news for doing this on real
> systems, which may have several TBs per OSD.
>
> I had a look at the recovery profiles documentation here mClock
> Config Reference — Ceph
> Documentation<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.ceph.com%2Fen%2Freef%2Frados%2Fconfiguration%2Fmclock-config-ref%2F&data=05%7C02%7Calexhus%40microsoft.com%7C3d103f3b2d6c4d46010008dcf1d330cf%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638651136608220372%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=76O0uC%2BC6IfqVD%2B5HGZC7kOwwzuu8AVXyM9iE6gdEw0%3D&reserved=0<https://docs.ceph.com/en/reef/rados/configuration/mclock-config-ref/>>
> which seemed to indicate I could speed this up (but my impression was maybe
> I could get a speed up of 2x which seems like it will still take an
> age).
>
> On the other hand, just switching off the host running the OSD and
> doing an offline host removal ("ceph orch host rm <hostname>
> --offline") seems much easier, with the trade-off that the Cluster
> recovers after the loss of the OSD rather than pre-emptively. But
> that big risk of that seems to be mitigated by "ceph orch host
> ok-to-stop <hostname>" to check I won't cause any PGs to go offline
> before hand.
>
> Are there any tricks here that I'm missing?
>
> Thanks,
> Alex
>
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]