Greg, I'm in the process of doing so now. joshd asked for "debug filestore = 20" as well, and I just restarted an OSD with those changes. As soon as it crashes again, I'll post the log file.
joshd also had me open a bug: http://tracker.ceph.com/issues/7922 Thanks, -Aaron On Mon, Mar 31, 2014 at 2:05 PM, Gregory Farnum <[email protected]> wrote: > Can you reproduce this with "debug osd = 20" and "debug ms = 1" set on > the OSD? I think we'll need that data to track down what exactly has > gone wrong here. > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com > > > On Mon, Mar 31, 2014 at 1:22 PM, Aaron Ten Clay <[email protected]> > wrote: > > Hello fellow Cephers! > > > > Recently, before and after the update from 0.77 to 0.78, about half the > OSDs > > in my cluster crash quite frequently with 'osd/PG.cc: 5255: FAILED > assert(0 > > == "we got a bad state machine event")' > > > > I'm not sure if this is a bug (there are some similar-sounding reports in > > Redmine already), or a configuration/corruption issue on my cluster. > > > > I've got 22 OSDs on 5 hosts, running 0.78 across the board. Any pointers > > would be appreciated! I'd like to track down and resolve the issue, it's > > causing a lot of stalled requests from clients and seems like a > > generally-unhealthy state of being. > > > > Here's a fresh log file (~3MiB) from one OSD that crashed (old log moved > > aside before restarting after crash): > > http://www.aarontc.com/logs/ceph-osd.4.log > > > > Thanks, > > -Aaron > > > > > > _______________________________________________ > > ceph-users mailing list > > [email protected] > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > -- Aaron Ten Clay http://www.aarontc.com/
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
