Hi all,
there have been many reports about too slow backfill lately and mostly they
seemed related to a problem with mclock ops scheduling n quincy. The hallmark
was that backfill started fast and then slowed down a lot. I now make the same
observation on an octopus cluster with wpq and it looks very suspicious of a
problem with scheduling backfill operations. Here what I see:
We added 95 disks to a set of disks shared by 2 pools. This is about 8% of the
total number of disks and they were distributed over all 12 OSD hosts. The 2
pools are 8+2 and 8+3 EC fs-data pools. Initially the backfill was as fast as
expected, but over the last day was really slow (compared with expectation).
Only 33 PGs were backfilling. I have osd_max_backfills=3 and a simple estimate
says there should be between 100 - 200 PGs backfilling.
To speed things up, I increased osd_max_backfills=5 and the number of
backfilling PGs jumped right up to over 200. That's way more than the relative
increase would warrant. Just to check, I set osd_max_backfills=3 again to see
if the number of PGs goes back to about 30 again. But no! Now I have 142 PGs
backfilling, as expected.
This looks very much like PGs eligible for backfill don't start or backfill
reservations are removed for some reason. Can anyone help out here what might
be the problem? I don't want to start a cron job to set osd_max_backfills up
and down. There must be something else at play here. Output of ceph status and
config set commands below.
The number of backfilling PGs is decreasing again and I would really like this
to be stable by itself. To give an idea of the problem, we talk here about a
rebalancing taking either 2 weeks or 2 months. That's not a bagatelle issue.
Thanks and best regards,
Frank
[root@gnosis ~]# ceph config dump | sed -e "s/ */ /g" | grep :hdd | grep
osd_max_backfills
osd class:hdd advanced osd_max_backfills 3
[root@gnosis ~]# ceph status
cluster:
id: ###
health: HEALTH_OK
services:
mon: 5 daemons, quorum ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 (age 7d)
mgr: ceph-25(active, since 10w), standbys: ceph-03, ceph-02, ceph-01,
ceph-26
mds: con-fs2:8 4 up:standby 8 up:active
osd: 1260 osds: 1260 up (since 2d), 1260 in (since 2d); 6487 remapped pgs
task status:
data:
pools: 14 pools, 25065 pgs
objects: 1.49G objects, 2.8 PiB
usage: 3.4 PiB used, 9.7 PiB / 13 PiB avail
pgs: 2466697364/12910834502 objects misplaced (19.106%)
18571 active+clean
6453 active+remapped+backfill_wait
34 active+remapped+backfilling
7 active+clean+snaptrim
io:
client: 30 MiB/s rd, 221 MiB/s wr, 1.08k op/s rd, 1.54k op/s wr
recovery: 1.0 GiB/s, 380 objects/s
[root@gnosis ~]# ceph config set osd/class:hdd osd_max_backfills 5
[root@gnosis ~]# ceph status
cluster:
id: ###
health: HEALTH_OK
services:
mon: 5 daemons, quorum ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 (age 7d)
mgr: ceph-25(active, since 10w), standbys: ceph-03, ceph-02, ceph-01,
ceph-26
mds: con-fs2:8 4 up:standby 8 up:active
osd: 1260 osds: 1260 up (since 2d), 1260 in (since 2d); 6481 remapped pgs
task status:
data:
pools: 14 pools, 25065 pgs
objects: 1.49G objects, 2.8 PiB
usage: 3.4 PiB used, 9.7 PiB / 13 PiB avail
pgs: 2466120124/12911195308 objects misplaced (19.101%)
18574 active+clean
6247 active+remapped+backfill_wait
234 active+remapped+backfilling
6 active+clean+snaptrim
2 active+clean+scrubbing+deep
2 active+clean+scrubbing
io:
client: 34 MiB/s rd, 236 MiB/s wr, 1.28k op/s rd, 2.03k op/s wr
recovery: 6.4 GiB/s, 2.39k objects/s
[root@gnosis ~]# ceph config set osd/class:hdd osd_max_backfills 3
[root@gnosis ~]# ceph status
cluster:
id: ###
health: HEALTH_OK
services:
mon: 5 daemons, quorum ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 (age 7d)
mgr: ceph-25(active, since 10w), standbys: ceph-03, ceph-02, ceph-01,
ceph-26
mds: con-fs2:8 4 up:standby 8 up:active
osd: 1260 osds: 1260 up (since 2d), 1260 in (since 2d); 6481 remapped pgs
task status:
data:
pools: 14 pools, 25065 pgs
objects: 1.49G objects, 2.8 PiB
usage: 3.4 PiB used, 9.7 PiB / 13 PiB avail
pgs: 2465974875/12911218789 objects misplaced (19.099%)
18578 active+clean
6339 active+remapped+backfill_wait
142 active+remapped+backfilling
6 active+clean+snaptrim
io:
client: 32 MiB/s rd, 247 MiB/s wr, 1.10k op/s rd, 1.57k op/s wr
recovery: 4.2 GiB/s, 1.56k objects/s
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]