Thanks. I should have mentioned that the errors are pretty well
distributed across the cluster:
ceph1: /var/log/ceph/ceph-osd.0.log 71
ceph1: /var/log/ceph/ceph-osd.1.log 112
ceph1: /var/log/ceph/ceph-osd.2.log 38
ceph2: /var/log/ceph/ceph-osd.3.log 88
ceph2: /var/log/ceph/ceph-osd.4.log 54
ceph3: /var/log/ceph/ceph-osd.5.log 36
ceph3: /var/log/ceph/ceph-osd.6.log 48
ceph3: /var/log/ceph/ceph-osd.7.log 39
ceph3: /var/log/ceph/ceph-osd.8.log 40
ceph4: /var/log/ceph/ceph-osd.10.log 95
ceph4: /var/log/ceph/ceph-osd.9.log 139
ceph5: /var/log/ceph/ceph-osd.11.log 81
ceph5: /var/log/ceph/ceph-osd.12.log 393
I'll try to catch them while they're happening and see what I can
learn.
Thanks again!!
Jeff
On Thu, Nov 20, 2014 at 06:40:57AM -0800, Jean-Charles LOPEZ wrote:
> Hi Jeff,
>
> it would probably wise to first check what these slow requests are:
> 1) ceph health detail -> This will tell you which OSDs are experiencing the
> slow requests
> 2) ceph daemon osd.{id} dump_ops_in_flight -> To be issued on one of the
> above OSDs will tell you what theses ops are waiting for.
>
> My fair guess is that either you have a network problem or some other drives
> in your cluster are about to die or are experiencing write errors causing
> retries and slowing the request processing.
>
> Just to be sure, if your drives are SMART capable, use smartctl to look ate
> the stats for the drives you will have potentially identified in the steps
> above.
>
> Regards
> JC
>
>
>
> > On Nov 20, 2014, at 06:00, Jeff <[email protected]> wrote:
> >
> > Hi,
> >
> > We have a five node cluster that has been running for a long
> > time (over a year). A few weeks ago we upgraded to 0.87 (giant) and
> > things continued to work well.
> >
> > Last week a drive failed on one of the nodes. We replaced the
> > drive and things were working well again.
> >
> > After about six days we started getting lots of "slow
> > requests...blocked for..." messages (100's/hour) and performance has been
> > terrible. Since then we've made sure to have all of the latest OS patches
> > and rebooted all five nodes. We are still seeing a lot of slow
> > requests/blocked messages. Any idea(s) on what's wrong/where to look?
> >
> > Thanks!
> > Jeff
> > --
> > _______________________________________________
> > ceph-users mailing list
> > [email protected]
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
--
===============================================================================
Jeff's Used Movie Finder
http://www.usedmoviefinder.com
email: [email protected]
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com