Re: [ceph-users] Why one crippled osd can slow down or block all request to the whole ceph cluster?

shadow_lin Wed, 07 Mar 2018 04:37:31 -0800

What you said make sense.
I have encountered a few hardware related issue that caused one osd to work 
abnormal and blocked all io of the whole cluster(all osd in one pool) which 
makes me think how to avoid this situation.


2018-03-07 

shadow_lin 



发件人：David Turner <[email protected]>
发送时间：2018-03-07 13:51
主题：Re: Re: [ceph-users] Why one crippled osd can slow down or block all request 
to the whole ceph cluster?
收件人："shadow_lin"<[email protected]>
抄送："ceph-users"<[email protected]>

Marking osds down is not without risks. You are taking away one of the copies 
of data for every PG on that osd. Also you are causing every PG on that osd to 
peer. If that osd comes back up, every PG on it again needs to peer and then 
they need to recover.


That is a lot of load and risks to automate into the system. Now let's take 
into consideration other causes of slow requests like having more IO load than 
your spindle can handle, backfilling settings set to aggressively (related to 
the first option), or networking problems. If the mon is detecting slow 
requests on OSDs and marking them down, you could end up marking half of your 
cluster down or causing corrupt data by flapping OSDs.


The mon will mark osds down if those settings I mentioned are met. If the osd 
isn't unresponsive enough to not respond to other OSDs or the mons, then there 
really isn't much that ceph can do to automate this safely. There are just so 
many variables. If ceph was a closed system on specific hardware, it could 
certainly be monitoring that hardware closely for early warning signs... But 
people are running Ceph on everything they can compile it for including 
raspberry pis. The cluster admin, however, should be able to add their own 
early detection for failures.


You can monitor a lot about disks including things such as average await in a 
host to see if the disks are taking longer than normal to respond. That 
particular check led us to find that we had several storage nodes with bad 
cache batteries on the controllers. Finding that explained some slowness we had 
noticed in the cluster. It also led us to a better method to catch that 
scenario sooner.


On Tue, Mar 6, 2018, 11:22 PM shadow_lin <[email protected]> wrote:

Hi Turner,
Thanks for your insight.
I am wondering if the mon can detect slow/blocked request from certain osd why 
can't mon mark a osd with blocked request down if the request is blocked for a 
certain time.

2018-03-07 

shadow_lin 



发件人：David Turner <[email protected]>
发送时间：2018-03-06 23:56
主题：Re: [ceph-users] Why one crippled osd can slow down or block all request to 
the whole ceph cluster?
收件人："shadow_lin"<[email protected]>
抄送："ceph-users"<[email protected]>

There are multiple settings that affect this.  osd_heartbeat_grace is probably 
the most apt.  If an OSD is not getting a response from another OSD for more 
than the heartbeat_grace period, then it will tell the mons that the OSD is 
down.  Once mon_osd_min_down_reporters have told the mons that an OSD is down, 
then the OSD will be marked down by the cluster.  If the OSD does not then talk 
to the mons directly to say that it is up, it will be marked out after 
mon_osd_down_out_interval is reached.  If it does talk to the mons to say that 
it is up, then it should be responding again and be fine. 


In your case where the OSD is half up, half down... I believe all you can 
really do is monitor your cluster and troubleshoot OSDs causing problems like 
this.  Basically every storage solution is vulnerable to this.  Sometimes an 
OSD just needs to be restarted due to being in a bad state somehow, or simply 
removed from the cluster because the disk is going bad.


On Sun, Mar 4, 2018 at 2:28 AM shadow_lin <[email protected]> wrote:

Hi list,
During my test of ceph,I find sometime the whole ceph cluster are blocked and 
the reason was one unfunctional osd.Ceph can heal itself if some osd is down, 
but it seems if some osd is half dead (have heart beat but can't handle 
request) then all the request which are directed to that osd would be blocked. 
If all osds are in one pool and the whole cluster would be blocked due to that 
one hanged osd.
I think this is because ceph will try to distribute the request to all osds and 
if one of the osd wont confirm the request is done then everything is blocked.
Is there a way to let ceph to mark the the crippled osd down if the requests 
direct to that osd are blocked more than certain time to avoid the whole 
cluster is blocked?

2018-03-04


shadow_lin 
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Why one crippled osd can slow down or block all request to the whole ceph cluster?

Reply via email to