Hello,
during OSD restarts with Jewel (10.2.5 and .6 at least) I've seen
"stuck inactive for more than 300 seconds" errors like this when observing
things with "watch ceph -s" :
---
health HEALTH_ERR
59 pgs are stuck inactive for more than 300 seconds
223 pgs degraded
74 pgs peering
84 pgs stale
59 pgs stuck inactive
297 pgs stuck unclean
223 pgs undersized
recovery 38420/179352 objects degraded (21.422%)
2/16 in osds are down
---
Now this is is neither reflected in any logs, nor true of course (the
restarts take a few seconds per OSD and the cluster is fully recovered
to HEALTH_OK in 12 seconds or so.
But it surely is a good scare for somebody not doing this on a test
cluster.
Anybody else seeing this?
Christian
--
Christian Balzer Network/Systems Engineer
[email protected] Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com