Re: [ceph-users] MDS crashed (ver 0.56.2)

x yasha Fri, 12 Apr 2013 02:54:13 -0700

Actually I didn't put any data into my Ceph cluster.

I was just trying to understand Ceph's principles but reading code and
running a test cluster. A lot of operations were done and I can't remember
them. So I just ignored this error message and did mkcephfs.
But I still remember that I was focused on Ceph roles' interaction, so
start/stop of specific daemons(ceph-mon, ceph-mds, ceph-osd) were exeucted.


Sorry for cannot provide more information :)


On Thu, Apr 11, 2013 at 11:17 PM, Gregory Farnum <[email protected]> wrote:

> That's certainly not great. Have you lost any data or removed anything
> from the cluster? It looks like perhaps your MDS log lost an object,
> and maybe got one shortened as well.
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> On Mon, Apr 8, 2013 at 11:55 PM, x yasha <[email protected]> wrote:
> > I'm testing ceph for a while with a 4 node cluster(1 mon, 1 mds, and 2
> > osds), each installed ceph 0.56.2.
> >
> > Today I ran into a mds crash case, on host mds process ceph-mds is
> > terminated by assert().
> > My questions here are:
> > 1. Reason of mds' crash.
> > 2. How to solve it without mkcephfs.
> >
> > It's reproducible in my environment.
> > Following is information may be related:
> > 1. "ceph -s" output
> > 2. ceph.conf
> > 3. part of ceph-mds.a.log (the whole log file is at
> > http://pastebin.com/NJd0UCfF)
> >
> > 1. "ceph -s" output
> > ==============
> >    health HEALTH_WARN mds a is laggy
> >    monmap e1: 1 mons at {a=mon.mon.mon.mon:6789/0}, election epoch 1,
> quorum
> > 0 a
> >    osdmap e220: 2 osds: 2 up, 2 in
> >     pgmap v3614: 576 pgs: 576 active+clean; 6618 KB data, 162 MB used,
> 4209
> > MB / 4606 MB avail
> >    mdsmap e860: 1/1/1 up {0=a=up:active(laggy or crashed)}
> >
> > 2. ceph.conf
> > =========
> > [global]
> >     auth supported = none
> >     auth cluster required = none
> >     auth service required = none
> >     auth client required = none
> >     debug mds = 20
> >
> > [mon]
> >     mon data = /usr/local/etc/ceph/mon.$id
> > [mon.a]
> >     host = mon
> >     mon addr = xx.xx.xx.xx:6789
> >
> > [mds]
> > [mds.a]
> >     host = mds
> >
> > [osd]
> >     osd data = /ceph/data
> >     osd journal size = 128
> >     filestore xattr use omap = true
> > [osd.0]
> >     host = osd0
> > [osd.1]
> >     host = osd1
> >
> > 3. part of ceph-mds.a.log
> > ==================
> > 2013-04-09 02:22:58.577485 7f587b640700  1 mds.0.35 handle_mds_map i am
> now
> > mds.0.35
> > 2013-04-09 02:22:58.577489 7f587b640700  1 mds.0.35 handle_mds_map state
> > change up:rejoin --> up:active
> > 2013-04-09 02:22:58.577494 7f587b640700  1 mds.0.35 recovery_done --
> > successful recovery!
> > 2013-04-09 02:22:58.577507 7f587b640700  7 mds.0.tableserver(anchortable)
> > finish_recovery
> > 2013-04-09 02:22:58.577515 7f587b640700  7 mds.0.tableserver(snaptable)
> > finish_recovery
> > 2013-04-09 02:22:58.577521 7f587b640700  7 mds.0.tableclient(anchortable)
> > finish_recovery
> > 2013-04-09 02:22:58.577525 7f587b640700  7 mds.0.tableclient(snaptable)
> > finish_recovery
> > 2013-04-09 02:22:58.577529 7f587b640700 10 mds.0.cache
> > start_recovered_truncates
> > 2013-04-09 02:22:58.577533 7f587b640700 10 mds.0.cache do_file_recover 0
> > queued, 0 recovering
> > 2013-04-09 02:22:58.577541 7f587b640700 10 mds.0.cache reissue_all_caps
> > 2013-04-09 02:22:58.581855 7f587b640700 -1 mds/MDCache.cc: In function
> 'void
> > MDCache::populate_mydir()' thread 7f587b640700 time 2013-04-09
> > 02:22:58.577558
> > mds/MDCache.cc: 579: FAILED assert(mydir)
> >
> >  ceph version 0.56.2 (586538e22afba85c59beda49789ec42024e7a061)
> >  1: (MDCache::populate_mydir()+0xbc5) [0x5f0125]
> >  2: (MDS::recovery_done()+0xde) [0x4ed12e]
> >  3: (MDS::handle_mds_map(MMDSMap*)+0x39c8) [0x4fff28]
> >  4: (MDS::handle_core_message(Message*)+0xb4b) [0x50596b]
> >  5: (MDS::_dispatch(Message*)+0x2f) [0x505a9f]
> >  6: (MDS::ms_dispatch(Message*)+0x23b) [0x50759b]
> >  7: (Messenger::ms_deliver_dispatch(Message*)+0x66) [0x872a26]
> >  8: (DispatchQueue::entry()+0x32a) [0x87093a]
> >  9: (DispatchQueue::DispatchThread::entry()+0xd) [0x7ee7cd]
> >  10: (()+0x6a3f) [0x7f587f465a3f]
> >  11: (clone()+0x6d) [0x7f587df1967d]
> >  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to
> > interpret this.
> >
> > --- begin dump of recent events ---
> >
> > _______________________________________________
> > ceph-users mailing list
> > [email protected]
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] MDS crashed (ver 0.56.2)

Reply via email to