Trying to remove one of the folders made the mds.a and mds.b to stop. so
somting is wrong in my mds.
ceph -s gives
2012-06-06 06:19:19.899973 pg v1220573: 1152 pgs: 1152 active+clean; 191 GB
data, 393 GB used, 973 GB / 1379 GB avail
2012-06-06 06:19:19.905097 mds e78: 1/1/1 up {0=c=up:active}
2012-06-06 06:19:19.905200 osd e1114: 8 osds: 8 up, 8 in
2012-06-06 06:19:19.905400 log 2012-06-06 05:51:31.499366 osd.3
10.0.6.11:6804/2933 804 : [INF] 0.c scrub ok
2012-06-06 06:19:19.905598 mon e1: 3 mons at
{a=10.0.6.10:6789/0,b=10.0.6.11:6789/0,c=10.0.6.12:6789/0}
i checked the log files on ceph1 and 2 where I have my mon.
mds.a -------------------
cessful recovery!
-2> 2012-06-06 05:38:35.956195 7f2d5ea08700 1 mds.0.12 active_start
-1> 2012-06-06 05:38:35.967760 7f2d5ea08700 1 mds.0.12 cluster recovered.
0> 2012-06-06 05:38:37.200297 7f2d5ea08700 -1 mds/AnchorServer.cc: In function
'virtual void AnchorServer::handle_query(MMDSTableRequest*)' thread
7f2d5ea08700 time 2012-06-06 05:38:37.198981
mds/AnchorServer.cc: 249: FAILED assert(anchor_map.count(curino) == 1)
ceph version 0.46 (commit:cb7f1c9c7520848b0899b26440ac34a8acea58d1)
1: (AnchorServer::handle_query(MMDSTableRequest*)+0x175) [0x6bdc95]
2: (MDS::handle_deferrable_message(Message*)+0xd84) [0x4b0474]
3: (MDS::_dispatch(Message*)+0xaf8) [0x4c50b8]
4: (MDS::ms_dispatch(Message*)+0x1fb) [0x4c628b]
5: (SimpleMessenger::dispatch_entry()+0x979) [0x7acb49]
6: (SimpleMessenger::DispatchThread::entry()+0xd) [0x7336ed]
7: (()+0x68ca) [0x7f2d6346e8ca]
8: (clone()+0x6d) [0x7f2d61cf692d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.
--- end dump of recent events ---
2012-06-06 05:38:37.203277 7f2d5ea08700 -1 *** Caught signal (Aborted) **
in thread 7f2d5ea08700
ceph version 0.46 (commit:cb7f1c9c7520848b0899b26440ac34a8acea58d1)
1: /usr/bin/ceph-mds() [0x814279]
2: (()+0xeff0) [0x7f2d63476ff0]
3: (gsignal()+0x35) [0x7f2d61c591b5]
4: (abort()+0x180) [0x7f2d61c5bfc0]
5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f2d624eddc5]
6: (()+0xcb166) [0x7f2d624ec166]
7: (()+0xcb193) [0x7f2d624ec193]
8: (()+0xcb28e) [0x7f2d624ec28e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x940)
[0x74f9b0]
10: (AnchorServer::handle_query(MMDSTableRequest*)+0x175) [0x6bdc95]
11: (MDS::handle_deferrable_message(Message*)+0xd84) [0x4b0474]
12: (MDS::_dispatch(Message*)+0xaf8) [0x4c50b8]
13: (MDS::ms_dispatch(Message*)+0x1fb) [0x4c628b]
14: (SimpleMessenger::dispatch_entry()+0x979) [0x7acb49]
15: (SimpleMessenger::DispatchThread::entry()+0xd) [0x7336ed]
16: (()+0x68ca) [0x7f2d6346e8ca]
17: (clone()+0x6d) [0x7f2d61cf692d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.
--- begin dump of recent events ---
0> 2012-06-06 05:38:37.203277 7f2d5ea08700 -1 *** Caught signal (Aborted) **
in thread 7f2d5ea08700
ceph version 0.46 (commit:cb7f1c9c7520848b0899b26440ac34a8acea58d1)
1: /usr/bin/ceph-mds() [0x814279]
2: (()+0xeff0) [0x7f2d63476ff0]
3: (gsignal()+0x35) [0x7f2d61c591b5]
4: (abort()+0x180) [0x7f2d61c5bfc0]
5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f2d624eddc5]
6: (()+0xcb166) [0x7f2d624ec166]
7: (()+0xcb193) [0x7f2d624ec193]
8: (()+0xcb28e) [0x7f2d624ec28e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x940)
[0x74f9b0]
10: (AnchorServer::handle_query(MMDSTableRequest*)+0x175) [0x6bdc95]
11: (MDS::handle_deferrable_message(Message*)+0xd84) [0x4b0474]
12: (MDS::_dispatch(Message*)+0xaf8) [0x4c50b8]
13: (MDS::ms_dispatch(Message*)+0x1fb) [0x4c628b]
14: (SimpleMessenger::dispatch_entry()+0x979) [0x7acb49]
15: (SimpleMessenger::DispatchThread::entry()+0xd) [0x7336ed]
16: (()+0x68ca) [0x7f2d6346e8ca]
17: (clone()+0x6d) [0x7f2d61cf692d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.
--- end dump of recent events ---
the ceph -v reports on my diffrent servers
root@ceph1:~# ceph -v
ceph version 0.47.2 (commit:8bf9fde89bd6ebc4b0645b2fe02dadb1c17ad372)
root@ceph1:~# ssh ceph2 ceph -v
ceph version 0.47.2 (commit:8bf9fde89bd6ebc4b0645b2fe02dadb1c17ad372)
root@ceph1:~# ssh ceph3 ceph -v
ceph version 0.47.2 (commit:8bf9fde89bd6ebc4b0645b2fe02dadb1c17ad372)
root@ceph1:~# ssh ceph4 ceph -v
ceph version 0.47.2 (commit:8bf9fde89bd6ebc4b0645b2fe02dadb1c17ad372)
is the 0.46 above reporting when the error occurred or am I running the wrong
binaries
i use the debian packages ?
mds.b
0> 2012-06-06 05:38:17.533743 7fae49945700 -1 mds/AnchorServer.cc: In function
'virtual void AnchorServer::handle_query(MMDSTableRequest*)' thread
7fae49945700 time 2012-06-06 05:38:17.523498
mds/AnchorServer.cc: 249: FAILED assert(anchor_map.count(curino) == 1)
ceph version 0.47.2 (commit:8bf9fde89bd6ebc4b0645b2fe02dadb1c17ad372)
1: (AnchorServer::handle_query(MMDSTableRequest*)+0x175) [0x6c1125]
2: (MDS::handle_deferrable_message(Message*)+0xd84) [0x4b1984]
3: (MDS::_dispatch(Message*)+0xafa) [0x4c61da]
4: (MDS::ms_dispatch(Message*)+0x1fb) [0x4c73ab]
5: (SimpleMessenger::dispatch_entry()+0x979) [0x7b4729]
6: (SimpleMessenger::DispatchThread::entry()+0xd) [0x7365cd]
7: (()+0x68ca) [0x7fae4e3ab8ca]
8: (clone()+0x6d) [0x7fae4cc3392d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.
--- end dump of recent events ---
2012-06-06 05:38:17.711889 7fae49945700 -1 *** Caught signal (Aborted) **
in thread 7fae49945700
ceph version 0.47.2 (commit:8bf9fde89bd6ebc4b0645b2fe02dadb1c17ad372)
1: /usr/bin/ceph-mds() [0x81da89]
2: (()+0xeff0) [0x7fae4e3b3ff0]
3: (gsignal()+0x35) [0x7fae4cb961b5]
4: (abort()+0x180) [0x7fae4cb98fc0]
5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fae4d42adc5]
6: (()+0xcb166) [0x7fae4d429166]
7: (()+0xcb193) [0x7fae4d429193]
8: (()+0xcb28e) [0x7fae4d42928e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x940)
[0x7555f0]
10: (AnchorServer::handle_query(MMDSTableRequest*)+0x175) [0x6c1125]
11: (MDS::handle_deferrable_message(Message*)+0xd84) [0x4b1984]
12: (MDS::_dispatch(Message*)+0xafa) [0x4c61da]
13: (MDS::ms_dispatch(Message*)+0x1fb) [0x4c73ab]
14: (SimpleMessenger::dispatch_entry()+0x979) [0x7b4729]
15: (SimpleMessenger::DispatchThread::entry()+0xd) [0x7365cd]
16: (()+0x68ca) [0x7fae4e3ab8ca]
17: (clone()+0x6d) [0x7fae4cc3392d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.
--- begin dump of recent events ---
0> 2012-06-06 05:38:17.711889 7fae49945700 -1 *** Caught signal (Aborted) **
in thread 7fae49945700
ceph version 0.47.2 (commit:8bf9fde89bd6ebc4b0645b2fe02dadb1c17ad372)
1: /usr/bin/ceph-mds() [0x81da89]
2: (()+0xeff0) [0x7fae4e3b3ff0]
3: (gsignal()+0x35) [0x7fae4cb961b5]
4: (abort()+0x180) [0x7fae4cb98fc0]
5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fae4d42adc5]
6: (()+0xcb166) [0x7fae4d429166]
7: (()+0xcb193) [0x7fae4d429193]
8: (()+0xcb28e) [0x7fae4d42928e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x940)
[0x7555f0]
10: (AnchorServer::handle_query(MMDSTableRequest*)+0x175) [0x6c1125]
11: (MDS::handle_deferrable_message(Message*)+0xd84) [0x4b1984]
12: (MDS::_dispatch(Message*)+0xafa) [0x4c61da]
13: (MDS::ms_dispatch(Message*)+0x1fb) [0x4c73ab]
14: (SimpleMessenger::dispatch_entry()+0x979) [0x7b4729]
15: (SimpleMessenger::DispatchThread::entry()+0xd) [0x7365cd]
16: (()+0x68ca) [0x7fae4e3ab8ca]
17: (clone()+0x6d) [0x7fae4cc3392d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.
--- end dump of recent events ---
> For future reference, that error was because the active MDS server was in
> replay. I can't tell why it didn't move on to active from what you posted,
> but I imagine it just got a little stuck since restarting made it work out.
> -Greg
>
>
> On Tuesday, June 5, 2012 at 1:05 PM, Martin Wilderoth wrote:
>
> > Hello Again,
> >
> > I restarted the mds on all servers and then it worked again
> >
> > /Regards Martin
> >
> > > Hello
> > >
> > > > Hi Martin,
> > > >
> > > > On 06/05/2012 08:07 PM, Martin Wilderoth wrote:
> > > > > Hello
> > > > >
> > > > > Is there a way to recover this error.
> > > > >
> > > > > mount -t ceph 10.0.6.10:/ /mnt -vv -o
> > > > > name=admin,secret=XXXXXXXXXXXXXXXXXXXXXXX
> > > > > [ 506.640433] libceph: loaded (mon/osd proto 15/24, osdmap 5/6 5/6)
> > > > > [ 506.650594] ceph: loaded (mds proto 32)
> > > > > [ 506.652353] libceph: client0 fsid
> > > > > a9d5f9e1-4bb9-4fab-b79b-ba4457631b01
> > > > > [ 506.670876] Intel AES-NI instructions are not detected.
> > > > > [ 506.678861] libceph: mon0 10.0.6.10:6789 session established
> > > > > mount: 10.0.6.10:/: can't read superblock
> > > >
> > > >
> > > >
> > > > Could you share some more information? For example the output from:
> > > > ceph -s
> > >
> > > 2012-06-05 20:25:05.307914 pg v1189604: 1152 pgs: 1152 active+clean; 191
> > > GB data, 393 GB used, 973 GB / 1379 GB > avail
> > > 012-06-05 20:25:05.315871 mds e60: 1/1/1 up {0=c=up:replay}, 2 up:standby
> > > 2012-06-05 20:25:05.315965 osd e1106: 8 osds: 8 up, 8 in
> > > 2012-06-05 20:25:05.316165 log 2012-06-05 20:24:50.425527 mon.0
> > > 10.0.6.10:6789/0 75 : [INF] mds.? >10.0.6.11:6800/22974 up:boot
> > > 2012-06-05 20:25:05.316371 mon e1: 3 mons at
> > > {a=10.0.6.10:6789/0,b=10.0.6.11:6789/0,c=10.0.6.12:6789/0}
> > >
> > >
> > > >
> > > > Did you change anything to the cluster since it worked? And what
> > > > version
> > > > are you running?
> > >
> > >
> > >
> > > I have not done any changes installed at version 0.46 upgraded earlier
> > > and have been testing with
> > > ceph and ceph-fuse and backuppc. It was during the ceph-fuse it hanged.
> > >
> > > Current version
> > > ceph version 0.47.2 (commit:8bf9fde89bd6ebc4b0645b2fe02dadb1c17ad372)
> > >
> > > > > One of my mds logs has 24G of data.
> > > >
> > > > Is it still running?
> > > I have restarted mds.a and mds.b they seems to be running. But not
> > > everything.
> > > mds.a was stoped not sure mds.b but it has a big logfile.
> > >
> > > >
> > > > >
> > > > > I have some rbd devices that I would like to keep.
> > > >
> > > > RBD doesn't use the MDS nor the POSIX filesystem, so you will probably
> > > > be fine, but we need the output of "ceph -s" first.
> > > >
> > > > Does this work?
> > > > $ rbd ls
> > >
> > >
> > > this works I'm still using the rbd with no problem
> > > > $ rados -p rbd ls
> > >
> > >
> > > seems to work reports something simmilar to
> > > rb.0.2.00000000052e
> > > rb.0.0.0000000002f2
> > > rb.0.7.000000000345
> > > rb.0.7.000000000896
> > > rb.0.0.000000000102
> > > rb.0.9.000000000172
> > > rb.0.1.000000000350
> > > rb.0.4.000000000180
> > > rb.0.4.00000000068b
> > > rb.0.5.00000000054c
> > > rb.0.2.0000000001e1
> > >
> > > > Wido
> > > >
> > > > >
> > > > > /Regards Martin
Regards / Med Vänlig Hälsning Martin Wilderoth VD Linserv AB Enhagsslingan 4A
SE-187 40 Täby www.linserv.se Tel: +46(0)8-473 60 63 Fax: +46(0)70-969 09 19
Email: [email protected] ,
Regards / Med Vänlig Hälsning
Martin Wilderoth
VD
Linserv AB
Enhagsslingan 4A
SE-187 40 Täby
www.linserv.se
Tel: +46(0)8-473 60 63
Fax: +46(0)70-969 09 19
Email: [email protected]
,
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html