Actually I didn't put any data into my Ceph cluster. I was just trying to understand Ceph's principles but reading code and running a test cluster. A lot of operations were done and I can't remember them. So I just ignored this error message and did mkcephfs. But I still remember that I was focused on Ceph roles' interaction, so start/stop of specific daemons(ceph-mon, ceph-mds, ceph-osd) were exeucted.
Sorry for cannot provide more information :) On Thu, Apr 11, 2013 at 11:17 PM, Gregory Farnum <[email protected]> wrote: > That's certainly not great. Have you lost any data or removed anything > from the cluster? It looks like perhaps your MDS log lost an object, > and maybe got one shortened as well. > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com > > > On Mon, Apr 8, 2013 at 11:55 PM, x yasha <[email protected]> wrote: > > I'm testing ceph for a while with a 4 node cluster(1 mon, 1 mds, and 2 > > osds), each installed ceph 0.56.2. > > > > Today I ran into a mds crash case, on host mds process ceph-mds is > > terminated by assert(). > > My questions here are: > > 1. Reason of mds' crash. > > 2. How to solve it without mkcephfs. > > > > It's reproducible in my environment. > > Following is information may be related: > > 1. "ceph -s" output > > 2. ceph.conf > > 3. part of ceph-mds.a.log (the whole log file is at > > http://pastebin.com/NJd0UCfF) > > > > 1. "ceph -s" output > > ============== > > health HEALTH_WARN mds a is laggy > > monmap e1: 1 mons at {a=mon.mon.mon.mon:6789/0}, election epoch 1, > quorum > > 0 a > > osdmap e220: 2 osds: 2 up, 2 in > > pgmap v3614: 576 pgs: 576 active+clean; 6618 KB data, 162 MB used, > 4209 > > MB / 4606 MB avail > > mdsmap e860: 1/1/1 up {0=a=up:active(laggy or crashed)} > > > > 2. ceph.conf > > ========= > > [global] > > auth supported = none > > auth cluster required = none > > auth service required = none > > auth client required = none > > debug mds = 20 > > > > [mon] > > mon data = /usr/local/etc/ceph/mon.$id > > [mon.a] > > host = mon > > mon addr = xx.xx.xx.xx:6789 > > > > [mds] > > [mds.a] > > host = mds > > > > [osd] > > osd data = /ceph/data > > osd journal size = 128 > > filestore xattr use omap = true > > [osd.0] > > host = osd0 > > [osd.1] > > host = osd1 > > > > 3. part of ceph-mds.a.log > > ================== > > 2013-04-09 02:22:58.577485 7f587b640700 1 mds.0.35 handle_mds_map i am > now > > mds.0.35 > > 2013-04-09 02:22:58.577489 7f587b640700 1 mds.0.35 handle_mds_map state > > change up:rejoin --> up:active > > 2013-04-09 02:22:58.577494 7f587b640700 1 mds.0.35 recovery_done -- > > successful recovery! > > 2013-04-09 02:22:58.577507 7f587b640700 7 mds.0.tableserver(anchortable) > > finish_recovery > > 2013-04-09 02:22:58.577515 7f587b640700 7 mds.0.tableserver(snaptable) > > finish_recovery > > 2013-04-09 02:22:58.577521 7f587b640700 7 mds.0.tableclient(anchortable) > > finish_recovery > > 2013-04-09 02:22:58.577525 7f587b640700 7 mds.0.tableclient(snaptable) > > finish_recovery > > 2013-04-09 02:22:58.577529 7f587b640700 10 mds.0.cache > > start_recovered_truncates > > 2013-04-09 02:22:58.577533 7f587b640700 10 mds.0.cache do_file_recover 0 > > queued, 0 recovering > > 2013-04-09 02:22:58.577541 7f587b640700 10 mds.0.cache reissue_all_caps > > 2013-04-09 02:22:58.581855 7f587b640700 -1 mds/MDCache.cc: In function > 'void > > MDCache::populate_mydir()' thread 7f587b640700 time 2013-04-09 > > 02:22:58.577558 > > mds/MDCache.cc: 579: FAILED assert(mydir) > > > > ceph version 0.56.2 (586538e22afba85c59beda49789ec42024e7a061) > > 1: (MDCache::populate_mydir()+0xbc5) [0x5f0125] > > 2: (MDS::recovery_done()+0xde) [0x4ed12e] > > 3: (MDS::handle_mds_map(MMDSMap*)+0x39c8) [0x4fff28] > > 4: (MDS::handle_core_message(Message*)+0xb4b) [0x50596b] > > 5: (MDS::_dispatch(Message*)+0x2f) [0x505a9f] > > 6: (MDS::ms_dispatch(Message*)+0x23b) [0x50759b] > > 7: (Messenger::ms_deliver_dispatch(Message*)+0x66) [0x872a26] > > 8: (DispatchQueue::entry()+0x32a) [0x87093a] > > 9: (DispatchQueue::DispatchThread::entry()+0xd) [0x7ee7cd] > > 10: (()+0x6a3f) [0x7f587f465a3f] > > 11: (clone()+0x6d) [0x7f587df1967d] > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed to > > interpret this. > > > > --- begin dump of recent events --- > > > > _______________________________________________ > > ceph-users mailing list > > [email protected] > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
