Re: [ceph-users] OSDs crashing frequently

Aaron Ten Clay Mon, 31 Mar 2014 14:16:46 -0700

Greg,

I'm in the process of doing so now. joshd asked for "debug filestore = 20"
as well, and I just restarted an OSD with those changes. As soon as it
crashes again, I'll post the log file.


joshd also had me open a bug: http://tracker.ceph.com/issues/7922

Thanks,
-Aaron


On Mon, Mar 31, 2014 at 2:05 PM, Gregory Farnum <[email protected]> wrote:

> Can you reproduce this with "debug osd = 20" and "debug ms = 1" set on
> the OSD? I think we'll need that data to track down what exactly has
> gone wrong here.
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> On Mon, Mar 31, 2014 at 1:22 PM, Aaron Ten Clay <[email protected]>
> wrote:
> > Hello fellow Cephers!
> >
> > Recently, before and after the update from 0.77 to 0.78, about half the
> OSDs
> > in my cluster crash quite frequently with 'osd/PG.cc: 5255: FAILED
> assert(0
> > == "we got a bad state machine event")'
> >
> > I'm not sure if this is a bug (there are some similar-sounding reports in
> > Redmine already), or a configuration/corruption issue on my cluster.
> >
> > I've got 22 OSDs on 5 hosts, running 0.78 across the board. Any pointers
> > would be appreciated! I'd like to track down and resolve the issue, it's
> > causing a lot of stalled requests from clients and seems like a
> > generally-unhealthy state of being.
> >
> > Here's a fresh log file (~3MiB) from one OSD that crashed (old log moved
> > aside before restarting after crash):
> > http://www.aarontc.com/logs/ceph-osd.4.log
> >
> > Thanks,
> > -Aaron
> >
> >
> > _______________________________________________
> > ceph-users mailing list
> > [email protected]
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>



-- 
Aaron Ten Clay
http://www.aarontc.com/

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSDs crashing frequently

Reply via email to