Re: [ceph-users] slow requests/blocked

Jeff Thu, 20 Nov 2014 07:31:09 -0800

Thanks.  I should have mentioned that the errors are pretty well
distributed across the cluster:


ceph1: /var/log/ceph/ceph-osd.0.log       71
ceph1: /var/log/ceph/ceph-osd.1.log      112
ceph1: /var/log/ceph/ceph-osd.2.log       38
ceph2: /var/log/ceph/ceph-osd.3.log       88
ceph2: /var/log/ceph/ceph-osd.4.log       54
ceph3: /var/log/ceph/ceph-osd.5.log       36
ceph3: /var/log/ceph/ceph-osd.6.log       48
ceph3: /var/log/ceph/ceph-osd.7.log       39
ceph3: /var/log/ceph/ceph-osd.8.log       40
ceph4: /var/log/ceph/ceph-osd.10.log      95
ceph4: /var/log/ceph/ceph-osd.9.log      139
ceph5: /var/log/ceph/ceph-osd.11.log      81
ceph5: /var/log/ceph/ceph-osd.12.log     393

I'll try to catch them while they're happening and see what I can
learn.

Thanks again!!

Jeff


On Thu, Nov 20, 2014 at 06:40:57AM -0800, Jean-Charles LOPEZ wrote:
> Hi Jeff,
> 
> it would probably wise to first check what these slow requests are:
> 1) ceph health detail -> This will tell you which OSDs are experiencing the 
> slow requests
> 2) ceph daemon osd.{id} dump_ops_in_flight -> To be issued on one of the 
> above OSDs will tell you what theses ops are waiting for.
> 
> My fair guess is that either you have a network problem or some other drives 
> in your cluster are about to die or are experiencing write errors causing 
> retries and slowing the request processing.
> 
> Just to be sure, if your drives are SMART capable, use smartctl to look ate 
> the stats for the drives you will have potentially identified in the steps 
> above.
> 
> Regards
> JC
> 
> 
> 
> > On Nov 20, 2014, at 06:00, Jeff <[email protected]> wrote:
> > 
> > Hi,
> > 
> >     We have a five node cluster that has been running for a long
> > time (over a year).  A few weeks ago we upgraded to 0.87 (giant) and 
> > things continued to work well.  
> > 
> >     Last week a drive failed on one of the nodes.  We replaced the
> > drive and things were working well again.
> > 
> >     After about six days we started getting lots of "slow
> > requests...blocked for..." messages (100's/hour) and performance has been
> > terrible.  Since then we've made sure to have all of the latest OS patches
> > and rebooted all five nodes.  We are still seeing a lot of slow
> > requests/blocked messages.  Any idea(s) on what's wrong/where to look?
> > 
> > Thanks!
> >     Jeff
> > -- 
> > _______________________________________________
> > ceph-users mailing list
> > [email protected]
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
===============================================================================
                        Jeff's Used Movie Finder    
                     http://www.usedmoviefinder.com
                    email: [email protected]
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] slow requests/blocked

Reply via email to