Re: [ceph-users] Ceph MDS continually respawning (hammer)

John Spray Fri, 22 May 2015 09:48:04 -0700


On 22/05/2015 15:33, Adam Tygart wrote:

Hello all,

The ceph-mds servers in our cluster are performing a constant
boot->replay->crash in our systems.

I have enable debug logging for the mds for a restart cycle on one of
the nodes[1].


You found a bug, or more correctly you probably found multiple bugs...

It looks like your journal contains an EOpen event that lists 5307092open files. Because the MDS only drops its lock between events, notduring processing a single one, this is causing the heartbeat map tothink the MDS has locked up, so it's getting killed.

So firstly we have to fix this to have appropriate calls intoMDS::heartbeat_reset while iterating over lists of unbounded length inEMetablob::replay. That would fix the false death of the MDS resultingfrom the heartbeat expiry.

Secondly, this EOpen was a 2.6GB log event. Something has almostcertainly gone wrong when we see that data structure grow so large, sowe should really be imposing some artificial cap there and catching thesituation earlier, rather than journal this monster event and onlyhitting issues during replay.

Thirdly, something is apparently leading the MDS to think that 5 millionfiles were open in this particular log segment. It seems like animprobable situation given that I can only see a single client in actionhere. More investigation needed to see how this happened. Can youdescribe the client workload that was going on in the run up to thesystem breaking?


Anyway, actions:

1. I'm assuming your metadata is not sensitive, as you have shared thisdebug log. Please could you use "cephfs-journal-tool journal export~/journal.bin" to grab an offline copy of the raw journal, in case weneed to look at it later (this might take a while since your journalseems so large, but the resulting file should compress reasonably wellwith "tar cSzf").

2. optimistically, you may be able to get out of this situation bymodifying the mds_beacon_grace config option on the MDS (set it tosomething high). This will cause the MDS to continue sending beacons tothe mons, even when a thread is failing to yield promptly (as in thiscase), thereby preventing the mons from regarding the MDS as failed.This hopefully will buy the MDS enough time to complete replay and comeup, assuming it doesn't run out of memory in the process of dealing withwhatever strangeness is in the journal.

3. If your MDS eventually makes it through recovery, unmount your clientand use "ceph daemon mds.<id> flush journal" to flush and trim thejournal: this should result in a situation where the next time the MDSstarts, the oversized journal entries are no longer present and startupshould go smoothly.


Cheers,
John
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph MDS continually respawning (hammer)

Reply via email to