Hi Guys
I am busy removing an OSD from my rook-ceph cluster. I did 'ceph osd out
osd.7' and the re-balancing process started. Now it has stalled with one
pg on "active+undersized+degraded". I have done this before and it has
worked fine.
# ceph health detail
HEALTH_WARN Degraded data redundancy: 15/94659 objects degraded (0.016%), 1
pg degraded, 1 pg undersized
[WRN] PG_DEGRADED: Degraded data redundancy: 15/94659 objects degraded
(0.016%), 1 pg degraded, 1 pg undersized
pg 3.1f is stuck undersized for 2h, current state
active+undersized+degraded, last acting [0,2]
# ceph pg dump_stuck
PG_STAT STATE UP UP_PRIMARY ACTING
ACTING_PRIMARY
3.1f active+undersized+degraded [0,2] 0 [0,2]
0
I have lots of OSDs on different nodes:
# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS
REWEIGHT PRI-AFF
-1 13.77573 root default
-5 13.77573 region FSN1
-22 0.73419 zone FSN1-DC13
-21 0 host node5-redacted-com
-27 0.73419 host node7-redacted-com
1 ssd 0.36710 osd.1 up
1.00000 1.00000
5 ssd 0.36710 osd.5 up
1.00000 1.00000
-10 6.20297 zone FSN1-DC14
-9 6.20297 host node3-redacted-com
2 ssd 3.10149 osd.2 up
1.00000 1.00000
4 ssd 3.10149 osd.4 up
1.00000 1.00000
-18 3.19919 zone FSN1-DC15
-17 3.19919 host node4-redacted-com
7 ssd 3.19919 osd.7 down
0 1.00000
-4 2.90518 zone FSN1-DC16
-3 2.90518 host node1-redacted-com
0 ssd 1.45259 osd.0 up
1.00000 1.00000
3 ssd 1.45259 osd.3 up
1.00000 1.00000
-14 0.73419 zone FSN1-DC18
-13 0 host node2-redacted-com
-25 0.73419 host node6-redacted-com
10 ssd 0.36710 osd.10 up
1.00000 1.00000
11 ssd 0.36710 osd.11 up
1.00000 1.00000
Any ideas on how to fix this?
Thanks
David
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]