On Wednesday, March 6, 2013 at 2:43 AM, Félix Ortega Hortigüela wrote: > Hi > I'm running ceph v56.3 over debian-wheezy, with the ceph.com > (http://ceph.com) debs. > My setup is three servers with 6 disk each. I have 5 disks on each server > dedicated to osd's and the disk left is dedicated to the monitors (three, one > for each server) and the mds's (three, one for each server, only one active > at a time). > We are using cephfs from another host, mounting it with the kernel driver. > > We are downloading data from ~150 servers with rsync every night. We try to > have 50 simultaneous rsync processes. All of this are running on the cephfs > exported filesystem. > The directory where we are downloading all data are on a pool configured with > min_size=2, so we have at least 2 copies for every object. > > Yesterday we were doing our downloads and the mds crashed. The other mds's > tried to start and then crashed also. This morning I had some issues with > some stuck inactive pgs and I have resolved it, but the mds don't want to > start. When I try to start it with "service ceph start mds 10" I have this > message on the logfile: > [...] > -3> 2013-03-06 11:24:02.304950 7f41c5afb700 10 mds.0.journal EMetaBlob.replay > inotable tablev 4296440 <= table 4296754 > -2> 2013-03-06 11:24:02.304952 7f41c5afb700 10 mds.0.journal EMetaBlob.replay > sessionmap v8546402 -(1|2) == table 8412050 prealloc [100004192ca~1] used > 10000418ee2 > -1> 2013-03-06 11:24:02.304956 7f41c5afb700 20 mds.0.journal (session > prealloc [1000040887b~3e8]) > 0> 2013-03-06 11:24:02.306239 7f41c5afb700 -1 mds/journal.cc > (http://journal.cc): In function 'void EMetaBlob::replay(MDS*, LogSegment*)' > thread 7f41c5afb700 time 2013-03-06 11:24:02.304977 > mds/journal.cc (http://journal.cc): 744: FAILED assert(i == > used_preallocated_ino)
Hmm, that assert indicates that your journal is bad in a way I haven't seen before — it indicates that the MDS recorded using an inode number that it wasn't allowed to use. (Or, perhaps, there's a bug somewhere else.) Do you have any logs of the initial MDS crash? -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com > > ceph version 0.56.3 (6eb7e15a4783b122e9b0c85ea9ba064145958aa5) > 1: (EMetaBlob::replay(MDS*, LogSegment*)+0x6bd8) [0x520a78] > 2: (EUpdate::replay(MDS*)+0x38) [0x523da8] > 3: (MDLog::_replay_thread()+0x5cf) [0x6d1eaf] > 4: (MDLog::ReplayThread::entry()+0xd) [0x50458d] > 5: (()+0x6b50) [0x7f41ce2e6b50] > 6: (clone()+0x6d) [0x7f41ccc96a7d] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to > interpret this. > [...] > > I have searched the assert problem (I think this is the problem, some problem > allocating inodes) but I didn't found anything. > > By now we don't have access to the filesystem. What can I do to start mds > again? _______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
