[ceph-users] Crashed MDS not starting

Félix Ortega Hortigüela Wed, 06 Mar 2013 03:41:46 -0800

Hi
I'm running ceph v56.3 over debian-wheezy, with the ceph.com debs.
My setup is three servers with 6 disk each. I have 5 disks on each server
dedicated to osd's and the disk left is dedicated to the monitors (three,
one for each server) and the mds's (three, one for each server, only one
active at a time).
We are using cephfs from another host, mounting it with the kernel driver.


We are downloading data from ~150 servers with rsync every night. We try to
have 50 simultaneous rsync processes. All of this are running on the cephfs
exported filesystem.
The directory where we are downloading all data are on a pool configured
with min_size=2, so we have at least 2 copies for every object.

Yesterday we were doing our downloads and the mds crashed. The other mds's
tried to start and then crashed also. This morning I had some issues with
some stuck inactive pgs and I have resolved it, but the mds don't want to
start. When I try to start it with "service ceph start mds 10" I have this
message on the logfile:
[...]
    -3> 2013-03-06 11:24:02.304950 7f41c5afb700 10 mds.0.journal
EMetaBlob.replay inotable tablev 4296440 <= table 4296754
    -2> 2013-03-06 11:24:02.304952 7f41c5afb700 10 mds.0.journal
EMetaBlob.replay sessionmap v8546402 -(1|2) == table 8412050 prealloc
[100004192ca~1] used 10000418ee2
    -1> 2013-03-06 11:24:02.304956 7f41c5afb700 20 mds.0.journal  (session
prealloc [1000040887b~3e8])
     0> 2013-03-06 11:24:02.306239 7f41c5afb700 -1 mds/journal.cc: In
function 'void EMetaBlob::replay(MDS*, LogSegment*)' thread 7f41c5afb700
time 2013-03-06 11:24:02.304977
mds/journal.cc: 744: FAILED assert(i == used_preallocated_ino)

 ceph version 0.56.3 (6eb7e15a4783b122e9b0c85ea9ba064145958aa5)
 1: (EMetaBlob::replay(MDS*, LogSegment*)+0x6bd8) [0x520a78]
 2: (EUpdate::replay(MDS*)+0x38) [0x523da8]
 3: (MDLog::_replay_thread()+0x5cf) [0x6d1eaf]
 4: (MDLog::ReplayThread::entry()+0xd) [0x50458d]
 5: (()+0x6b50) [0x7f41ce2e6b50]
 6: (clone()+0x6d) [0x7f41ccc96a7d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
to interpret this.
[...]

I have searched the assert problem (I think this is the problem, some
 problem allocating inodes) but I didn't found anything.

By now we don't have access to the filesystem. What can I do to start mds
again?

Thanks in advance.
--
Félix Ortega Hortigüela

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Crashed MDS not starting

Reply via email to