Hello,

Using Ceph MDS with one active and one standby server - a day ago one of the mds crashed and I restarted it.
Tonight it crashed again, a few hours later, also the second mds crashed.

#ceph -v
ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)

At the moment cephfs is dead - with following health status:

#ceph -s
    cluster b04fc583-9e71-48b7-a741-92f4dff4cfef
     health HEALTH_WARN mds cluster is degraded; mds c is laggy
monmap e3: 3 mons at {ceph-m-01=10.0.0.176:6789/0,ceph-m-02=10.0.1.107:6789/0,ceph-m-03=10.0.1.108:6789/0}, election epoch 6274, quorum 0,1,2 ceph-m-01,ceph-m-02,ceph-m-03
     mdsmap e2055: 1/1/1 up {0=ceph-m-03=up:rejoin(laggy or crashed)}
     osdmap e3752: 39 osds: 39 up, 39 in
      pgmap v3277576: 8328 pgs, 17 pools, 6461 GB data, 17066 kobjects
            13066 GB used, 78176 GB / 91243 GB avail
                8328 active+clean
  client io 1193 B/s rd, 0 op/s

I couldn't really find any useful infos in the logfiles nor searching in documentations. Any ideas how to get cephfs up and running?

Here is part of mds log:
2014-04-16 14:07:05.603501 7ff184c64700 1 mds.0.server reconnect gave up on client.7846580 10.0.1.152:0/14639
2014-04-16 14:07:05.603525 7ff184c64700  1 mds.0.46 reconnect_done
2014-04-16 14:07:05.674990 7ff186d69700 1 mds.0.46 handle_mds_map i am now mds.0.46 2014-04-16 14:07:05.674996 7ff186d69700 1 mds.0.46 handle_mds_map state change up:reconnect --> up:rejoin
2014-04-16 14:07:05.674998 7ff186d69700  1 mds.0.46 rejoin_start
2014-04-16 14:07:22.347521 7ff17f825700 0 -- 10.0.1.107:6815/17325 >> 10.0.1.68:0/4128280551 pipe(0x5e2ac80 sd=930 :6815 s=2 pgs=153 cs=1 l=0 c=0x5e2e160).fault with nothing to send, going to standby

Any ideas, how to solve "laggy or crashed" ?


Georg
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to