On 14-09-2023 17:32, Nathan Gleason wrote:
Hello,We had a network hiccup with a Ceph cluster and it made several of our osds go out/down. After the network was fixed the osds remain down. We have restarted them in numerous ways and they won’t come up. The logs for the down osds just repeat this line over and over "tick checking mon for new map”. There are osds on the same host that are up so there is connectivity between the osds and mons. Any advice on where to look for a resolution is appreciated. Thanks, Nathan Cluster was built with cephadm Ceph Quincy - 17.2.6 Docker version 23.0.2, build 569dd73 Ubuntu 20.04.6 LTS cluster: id: aa39fa2a-1510-11ee-953a-bd804ec1ea33 health: HEALTH_ERR Failed to apply 1 service(s): nfs.secstorage 1 filesystem is degraded 1 MDSs report slow metadata IOs Module 'cephadm' has failed: Command '['rados', '-n', 'mgr.cphprodc1-11.uuuhug', '-k', '/var/lib/ceph/mgr/ceph-cphprodc1-11.uuuhug/keyring', '-p', '.nfs', '--namespace', 'secstorage', 'rm', 'grace']' timed out after 10 seconds 28 osds down Reduced data availability: 36 pgs stale 2 daemons have recently crashed 1 mgr modules have recently crashed 945514 slow ops, oldest one blocked for 66804 sec, daemons [mon.cphprodc1-10,mon.cphprodc1-11,mon.cphprodc1-13] have slow ops.services:mon: 4 daemons, quorum cphprodc1-10,cphprodc1-11,cphprodc1-12,cphprodc1-13 (age 2h) mgr: cphprodc1-11.uuuhug(active, since 23h), standbys: cphprodc1-10.upwvbg mds: 1/1 daemons up, 1 standby osd: 64 osds: 19 up (since 2d), 47 in (since 23h)
What happens if you set all OSDs in manually? Side note: For that many OSDs there are only a few PGs. Is that on purpose? Gr. Stefan _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
