[ceph-users] Re: disk failure

Ashley Merrick Thu, 05 Sep 2019 11:14:40 -0700

I would suggest checking the logs and seeing the exact reason its being marked 
out.


If the disk is being hit hard and their is heavy I/O delays then Ceph may see 
that as a delayed reply outside of the set windows and mark as out.

There is some variables that can be changed to give an OSD more time to reply 
to a heartbeat, but I would definitely suggest checking the OSD log at the time 
of the disk being marked out to see exactly what's going on.

As the last thing you want to do is just patch an actually issue if there is 
one. 


---- On Fri, 06 Sep 2019 02:11:06 +0800 [email protected] wrote ----


no, I mean ceph sees it as a failure and marks it out for a while



On Thu, Sep 5, 2019 at 11:00 AM Ashley Merrick <[email protected]> wrote:

Is your HD actually failing and vanishing from the OS and then coming back 
shortly?

Or do you just mean your OSD is crashing and then restarting it self shortly 
later?



---- On Fri, 06 Sep 2019 01:55:25 +0800 [email protected] wrote ----


One of the things i've come to notice is when HDD drives fail, they often 
recover in a short time and get added back to the cluster.  This causes the 
data to rebalance back and forth, and if I set the noout flag I get a health 
warning.  Is there a better way to avoid this?




_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: disk failure

Reply via email to