Hi,
I'm trying to find reason for strange recovery issues I'm seeing on
our cluster..
it's mostly idle, 4 node cluster with 26 OSDs evenly distributed
across nodes. jewel 10.2.9
the problem is that after some disk replaces and data moves, recovery
is progressing extremely slowly.. pgs seem to be stuck in
active+recovering+degraded
state:
[root@v1d ~]# ceph -s
cluster a5efbc87-3900-4c42-a977-8c93f7aa8c33
health HEALTH_WARN
159 pgs backfill_wait
4 pgs backfilling
259 pgs degraded
12 pgs recovering
113 pgs recovery_wait
215 pgs stuck degraded
266 pgs stuck unclean
140 pgs stuck undersized
151 pgs undersized
recovery 37788/2327775 objects degraded (1.623%)
recovery 23854/2327775 objects misplaced (1.025%)
noout,noin flag(s) set
monmap e21: 3 mons at
{v1a=10.0.0.1:6789/0,v1b=10.0.0.2:6789/0,v1c=10.0.0.3:6789/0}
election epoch 6160, quorum 0,1,2 v1a,v1b,v1c
fsmap e817: 1/1/1 up {0=v1a=up:active}, 1 up:standby
osdmap e76002: 26 osds: 26 up, 26 in; 185 remapped pgs
flags noout,noin,sortbitwise,require_jewel_osds
pgmap v80995844: 3200 pgs, 4 pools, 2876 GB data, 757 kobjects
9215 GB used, 35572 GB / 45365 GB avail
37788/2327775 objects degraded (1.623%)
23854/2327775 objects misplaced (1.025%)
2912 active+clean
130 active+undersized+degraded+remapped+wait_backfill
97 active+recovery_wait+degraded
29 active+remapped+wait_backfill
12 active+recovery_wait+undersized+degraded+remapped
6 active+recovering+degraded
5 active+recovering+undersized+degraded+remapped
4 active+undersized+degraded+remapped+backfilling
4 active+recovery_wait+degraded+remapped
1 active+recovering+degraded+remapped
client io 2026 B/s rd, 146 kB/s wr, 9 op/s rd, 21 op/s wr
when I restart affected OSDs, it bumps the recovery, but then another
PGs get stuck.. All OSDs were restarted multiple times, none are even close to
nearfull, I just cant find what I'm doing wrong..
possibly related OSD options:
osd max backfills = 4
osd recovery max active = 15
debug osd = 0/0
osd op threads = 4
osd backfill scan min = 4
osd backfill scan max = 16
Any hints would be greatly appreciated
thanks
nik
--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava
tel.: +420 591 166 214
fax: +420 596 621 273
mobil: +420 777 093 799
www.linuxbox.cz
mobil servis: +420 737 238 656
email servis: [email protected]
-------------------------------------
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com