On Wednesday, March 6, 2013 at 2:43 AM, Félix Ortega Hortigüela wrote:
> Hi
> I'm running ceph v56.3 over debian-wheezy, with the ceph.com 
> (http://ceph.com) debs.
> My setup is three servers with 6 disk each. I have 5 disks on each server 
> dedicated to osd's and the disk left is dedicated to the monitors (three, one 
> for each server) and the mds's (three, one for each server, only one active 
> at a time).
> We are using cephfs from another host, mounting it with the kernel driver.
>  
> We are downloading data from ~150 servers with rsync every night. We try to 
> have 50 simultaneous rsync processes. All of this are running on the cephfs 
> exported filesystem.  
> The directory where we are downloading all data are on a pool configured with 
> min_size=2, so we have at least 2 copies for every object.
>  
> Yesterday we were doing our downloads and the mds crashed. The other mds's 
> tried to start and then crashed also. This morning I had some issues with 
> some stuck inactive pgs and I have resolved it, but the mds don't want to 
> start. When I try to start it with "service ceph start mds 10" I have this 
> message on the logfile:  
> [...]
> -3> 2013-03-06 11:24:02.304950 7f41c5afb700 10 mds.0.journal EMetaBlob.replay 
> inotable tablev 4296440 <= table 4296754
> -2> 2013-03-06 11:24:02.304952 7f41c5afb700 10 mds.0.journal EMetaBlob.replay 
> sessionmap v8546402 -(1|2) == table 8412050 prealloc [100004192ca~1] used 
> 10000418ee2
> -1> 2013-03-06 11:24:02.304956 7f41c5afb700 20 mds.0.journal (session 
> prealloc [1000040887b~3e8])
> 0> 2013-03-06 11:24:02.306239 7f41c5afb700 -1 mds/journal.cc 
> (http://journal.cc): In function 'void EMetaBlob::replay(MDS*, LogSegment*)' 
> thread 7f41c5afb700 time 2013-03-06 11:24:02.304977
> mds/journal.cc (http://journal.cc): 744: FAILED assert(i == 
> used_preallocated_ino)

Hmm, that assert indicates that your journal is bad in a way I haven't seen 
before — it indicates that the MDS recorded using an inode number that it 
wasn't allowed to use. (Or, perhaps, there's a bug somewhere else.)
Do you have any logs of the initial MDS crash?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

  
>  
> ceph version 0.56.3 (6eb7e15a4783b122e9b0c85ea9ba064145958aa5)
> 1: (EMetaBlob::replay(MDS*, LogSegment*)+0x6bd8) [0x520a78]
> 2: (EUpdate::replay(MDS*)+0x38) [0x523da8]
> 3: (MDLog::_replay_thread()+0x5cf) [0x6d1eaf]
> 4: (MDLog::ReplayThread::entry()+0xd) [0x50458d]
> 5: (()+0x6b50) [0x7f41ce2e6b50]
> 6: (clone()+0x6d) [0x7f41ccc96a7d]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
> interpret this.
> [...]
>  
> I have searched the assert problem (I think this is the problem, some problem 
> allocating inodes) but I didn't found anything.  
>  
> By now we don't have access to the filesystem. What can I do to start mds 
> again?  


_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to