On 14-09-2023 17:32, Nathan Gleason wrote:
Hello,

We had a network hiccup with a Ceph cluster and it made several of our osds go 
out/down.  After the network was fixed the osds remain down.  We have restarted 
them in numerous ways and they won’t come up.

The logs for the down osds just repeat this line over and over "tick checking 
mon for new map”.  There are osds on the same host that are up so there is 
connectivity between the osds and mons.


Any advice on where to look for a resolution is appreciated.

Thanks,
Nathan

Cluster was built with cephadm
Ceph Quincy - 17.2.6
Docker version 23.0.2, build 569dd73
Ubuntu 20.04.6 LTS

   cluster:
     id:     aa39fa2a-1510-11ee-953a-bd804ec1ea33
     health: HEALTH_ERR
             Failed to apply 1 service(s): nfs.secstorage
             1 filesystem is degraded
             1 MDSs report slow metadata IOs
             Module 'cephadm' has failed: Command '['rados', '-n', 
'mgr.cphprodc1-11.uuuhug', '-k', 
'/var/lib/ceph/mgr/ceph-cphprodc1-11.uuuhug/keyring', '-p', '.nfs', 
'--namespace', 'secstorage', 'rm', 'grace']' timed out after 10 seconds
             28 osds down
             Reduced data availability: 36 pgs stale
             2 daemons have recently crashed
             1 mgr modules have recently crashed
             945514 slow ops, oldest one blocked for 66804 sec, daemons 
[mon.cphprodc1-10,mon.cphprodc1-11,mon.cphprodc1-13] have slow ops.
services:
     mon: 4 daemons, quorum cphprodc1-10,cphprodc1-11,cphprodc1-12,cphprodc1-13 
(age 2h)
     mgr: cphprodc1-11.uuuhug(active, since 23h), standbys: cphprodc1-10.upwvbg
     mds: 1/1 daemons up, 1 standby
     osd: 64 osds: 19 up (since 2d), 47 in (since 23h)

What happens if you set all OSDs in manually?

Side note: For that many OSDs there are only a few PGs. Is that on purpose?

Gr. Stefan
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to