Re: mds crash

Sage Weil Tue, 19 Apr 2011 09:14:47 -0700

Hi Mark,

This should be fixed by d55399ffec224206ea324e83bb8ead1e9ca1eddc in the 
'next' branch of ceph.git.  Can you test it out and see if that allows 
journal replay to complete?


Thanks!
sage

http://tracker.newdream.net/issues/1019



On Tue, 19 Apr 2011, Mark Nigh wrote:

> I recently have been working with exporting ceph to NFS. I have had stability 
> problems with NFS (ceph is working but NFS crashes). But most recently, my 
> mds0 will not start after one of these instances with NFS.
> 
> My setup. 2 mds, 1 mon (located on mds0), 5 osds. All running Ubuntu v10.10.
> 
> Here is the output when I try to start the mds0. Is there other debugging I 
> can turn on?
> 
> /etc/init.d/ceph start mds0
> 
> 2011-04-19 10:06:58.602640 7fb202fe4700 mds0.11 ms_handle_connect on 
> 10.6.1.93:6800/945
> ./include/elist.h: In function 'elist<T>::item::~item() [with T = 
> MDSlaveUpdate*]', in thread '0x7fb2004d5700'
> ./include/elist.h: 39: FAILED assert(!is_on_list())
>  ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5)
>  1: (MDSlaveUpdate::~MDSlaveUpdate()+0x59) [0x4d9fe9]
>  2: (ESlaveUpdate::replay(MDS*)+0x422) [0x4d2772]
>  3: (MDLog::_replay_thread()+0xb90) [0x67f850]
>  4: (MDLog::ReplayThread::entry()+0xd) [0x4b89ed]
>  5: (()+0x7971) [0x7fb20564a971]
>  6: (clone()+0x6d) [0x7fb2042e692d]
>  ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5)
>  1: (MDSlaveUpdate::~MDSlaveUpdate()+0x59) [0x4d9fe9]
>  2: (ESlaveUpdate::replay(MDS*)+0x422) [0x4d2772]
>  3: (MDLog::_replay_thread()+0xb90) [0x67f850]
>  4: (MDLog::ReplayThread::entry()+0xd) [0x4b89ed]
>  5: (()+0x7971) [0x7fb20564a971]
>  6: (clone()+0x6d) [0x7fb2042e692d]
> *** Caught signal (Aborted) **
>  in thread 0x7fb2004d5700
>  ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5)
>  1: /usr/bin/cmds() [0x70fc38]
>  2: (()+0xfb40) [0x7fb205652b40]
>  3: (gsignal()+0x35) [0x7fb204233ba5]
>  4: (abort()+0x180) [0x7fb2042376b0]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fb204ad76bd]
>  6: (()+0xb9906) [0x7fb204ad5906]
>  7: (()+0xb9933) [0x7fb204ad5933]
>  8: (()+0xb9a3e) [0x7fb204ad5a3e]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x36a) [0x6f5eaa]
>  10: (MDSlaveUpdate::~MDSlaveUpdate()+0x59) [0x4d9fe9]
>  11: (ESlaveUpdate::replay(MDS*)+0x422) [0x4d2772]
>  12: (MDLog::_replay_thread()+0xb90) [0x67f850]
>  13: (MDLog::ReplayThread::entry()+0xd) [0x4b89ed]
>  14: (()+0x7971) [0x7fb20564a971]
>  15: (clone()+0x6d) [0x7fb2042e692d]
> 
> I am not sure why the IP address of 0.0.0.0 shows up with starting the mds0.
> 
> root@mds0:/var/log/ceph# /etc/init.d/ceph start mds0
> === mds.0 ===
> Starting Ceph mds.0 on mds0...
>  ** WARNING: Ceph is still under heavy development, and is only suitable for 
> **
>  **          testing and review.  Do not trust it with important data.       
> **
> starting mds.0 at 0.0.0.0:6800/2994
> 
> Thanks for your assistance.
> 
> Mark Nigh
> Systems Architect
> [email protected]
>  (p) 314.392.6926
> 
> 
> 
> 
> This transmission and any attached files are privileged, confidential or 
> otherwise the exclusive property of the intended recipient or Netelligent 
> Corporation. If you are not the intended recipient, any disclosure, copying, 
> distribution or use of any of the information contained in or attached to 
> this transmission is strictly prohibited. If you have received this 
> transmission in error, please contact us immediately by responding to this 
> message or by telephone (314-392-6900) and promptly destroy the original 
> transmission and its attachments.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: mds crash

Reply via email to