Hello,
We had a network hiccup with a Ceph cluster and it made several of our osds go
out/down. After the network was fixed the osds remain down. We have restarted
them in numerous ways and they won’t come up.
The logs for the down osds just repeat this line over and over "tick checking
mon for new map”. There are osds on the same host that are up so there is
connectivity between the osds and mons.
Any advice on where to look for a resolution is appreciated.
Thanks,
Nathan
Cluster was built with cephadm
Ceph Quincy - 17.2.6
Docker version 23.0.2, build 569dd73
Ubuntu 20.04.6 LTS
cluster:
id: aa39fa2a-1510-11ee-953a-bd804ec1ea33
health: HEALTH_ERR
Failed to apply 1 service(s): nfs.secstorage
1 filesystem is degraded
1 MDSs report slow metadata IOs
Module 'cephadm' has failed: Command '['rados', '-n',
'mgr.cphprodc1-11.uuuhug', '-k',
'/var/lib/ceph/mgr/ceph-cphprodc1-11.uuuhug/keyring', '-p', '.nfs',
'--namespace', 'secstorage', 'rm', 'grace']' timed out after 10 seconds
28 osds down
Reduced data availability: 36 pgs stale
2 daemons have recently crashed
1 mgr modules have recently crashed
945514 slow ops, oldest one blocked for 66804 sec, daemons
[mon.cphprodc1-10,mon.cphprodc1-11,mon.cphprodc1-13] have slow ops.
services:
mon: 4 daemons, quorum cphprodc1-10,cphprodc1-11,cphprodc1-12,cphprodc1-13
(age 2h)
mgr: cphprodc1-11.uuuhug(active, since 23h), standbys: cphprodc1-10.upwvbg
mds: 1/1 daemons up, 1 standby
osd: 64 osds: 19 up (since 2d), 47 in (since 23h)
data:
volumes: 0/1 healthy, 1 recovering
pools: 5 pools, 113 pgs
objects: 151.91k objects, 592 GiB
usage: 840 GiB used, 81 TiB / 82 TiB avail
pgs: 65 active+clean
36 stale+active+clean
7 active+clean+scrubbing
5 active+clean+scrubbing+deep
osd tree:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 111.78223 root default
-5 27.94556 host cphprodc1-10
1 ssd 1.74660 osd.1 down 1.00000 1.00000
5 ssd 1.74660 osd.5 down 1.00000 1.00000
8 ssd 1.74660 osd.8 down 1.00000 1.00000
12 ssd 1.74660 osd.12 down 1.00000 1.00000
14 ssd 1.74660 osd.14 down 1.00000 1.00000
18 ssd 1.74660 osd.18 down 0 1.00000
22 ssd 1.74660 osd.22 down 0 1.00000
26 ssd 1.74660 osd.26 down 0 1.00000
30 ssd 1.74660 osd.30 down 0 1.00000
34 ssd 1.74660 osd.34 down 1.00000 1.00000
37 ssd 1.74660 osd.37 down 1.00000 1.00000
41 ssd 1.74660 osd.41 down 1.00000 1.00000
45 ssd 1.74660 osd.45 up 1.00000 1.00000
48 ssd 1.74660 osd.48 up 1.00000 1.00000
52 ssd 1.74660 osd.52 up 1.00000 1.00000
56 ssd 1.74660 osd.56 up 1.00000 1.00000
-7 27.94556 host cphprodc1-11
2 ssd 1.74660 osd.2 down 0 1.00000
6 ssd 1.74660 osd.6 down 1.00000 1.00000
10 ssd 1.74660 osd.10 down 1.00000 1.00000
16 ssd 1.74660 osd.16 down 1.00000 1.00000
20 ssd 1.74660 osd.20 down 0 1.00000
24 ssd 1.74660 osd.24 down 0 1.00000
28 ssd 1.74660 osd.28 down 0 1.00000
32 ssd 1.74660 osd.32 down 0 1.00000
36 ssd 1.74660 osd.36 down 1.00000 1.00000
40 ssd 1.74660 osd.40 down 1.00000 1.00000
44 ssd 1.74660 osd.44 down 1.00000 1.00000
50 ssd 1.74660 osd.50 up 1.00000 1.00000
54 ssd 1.74660 osd.54 up 1.00000 1.00000
58 ssd 1.74660 osd.58 up 1.00000 1.00000
60 ssd 1.74660 osd.60 up 1.00000 1.00000
62 ssd 1.74660 osd.62 up 1.00000 1.00000
-3 27.94556 host cphprodc1-12
0 ssd 1.74660 osd.0 down 1.00000 1.00000
4 ssd 1.74660 osd.4 down 1.00000 1.00000
7 ssd 1.74660 osd.7 down 1.00000 1.00000
11 ssd 1.74660 osd.11 down 1.00000 1.00000
15 ssd 1.74660 osd.15 down 1.00000 1.00000
19 ssd 1.74660 osd.19 down 0 1.00000
23 ssd 1.74660 osd.23 down 0 1.00000
27 ssd 1.74660 osd.27 down 0 1.00000
31 ssd 1.74660 osd.31 down 0 1.00000
35 ssd 1.74660 osd.35 down 1.00000 1.00000
38 ssd 1.74660 osd.38 down 1.00000 1.00000
42 ssd 1.74660 osd.42 down 1.00000 1.00000
46 ssd 1.74660 osd.46 up 1.00000 1.00000
49 ssd 1.74660 osd.49 up 1.00000 1.00000
53 ssd 1.74660 osd.53 up 1.00000 1.00000
57 ssd 1.74660 osd.57 up 1.00000 1.00000
-9 27.94556 host cphprodc1-13
3 ssd 1.74660 osd.3 down 1.00000 1.00000
9 ssd 1.74660 osd.9 down 1.00000 1.00000
13 ssd 1.74660 osd.13 down 1.00000 1.00000
17 ssd 1.74660 osd.17 down 1.00000 1.00000
21 ssd 1.74660 osd.21 down 0 1.00000
25 ssd 1.74660 osd.25 down 0 1.00000
29 ssd 1.74660 osd.29 down 0 1.00000
33 ssd 1.74660 osd.33 down 0 1.00000
39 ssd 1.74660 osd.39 down 1.00000 1.00000
43 ssd 1.74660 osd.43 down 1.00000 1.00000
47 ssd 1.74660 osd.47 up 1.00000 1.00000
51 ssd 1.74660 osd.51 up 1.00000 1.00000
55 ssd 1.74660 osd.55 up 1.00000 1.00000
59 ssd 1.74660 osd.59 up 1.00000 1.00000
61 ssd 1.74660 osd.61 up 1.00000 1.00000
63 ssd 1.74660 osd.63 up 1.00000 1.00000
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]