Hello,
Using Ceph MDS with one active and one standby server - a day ago one of
the mds crashed and I restarted it.
Tonight it crashed again, a few hours later, also the second mds crashed.
#ceph -v
ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
At the moment cephfs is dead - with following health status:
#ceph -s
cluster b04fc583-9e71-48b7-a741-92f4dff4cfef
health HEALTH_WARN mds cluster is degraded; mds c is laggy
monmap e3: 3 mons at
{ceph-m-01=10.0.0.176:6789/0,ceph-m-02=10.0.1.107:6789/0,ceph-m-03=10.0.1.108:6789/0},
election epoch 6274, quorum 0,1,2 ceph-m-01,ceph-m-02,ceph-m-03
mdsmap e2055: 1/1/1 up {0=ceph-m-03=up:rejoin(laggy or crashed)}
osdmap e3752: 39 osds: 39 up, 39 in
pgmap v3277576: 8328 pgs, 17 pools, 6461 GB data, 17066 kobjects
13066 GB used, 78176 GB / 91243 GB avail
8328 active+clean
client io 1193 B/s rd, 0 op/s
I couldn't really find any useful infos in the logfiles nor searching in
documentations. Any ideas how to get cephfs up and running?
Here is part of mds log:
2014-04-16 14:07:05.603501 7ff184c64700 1 mds.0.server reconnect gave
up on client.7846580 10.0.1.152:0/14639
2014-04-16 14:07:05.603525 7ff184c64700 1 mds.0.46 reconnect_done
2014-04-16 14:07:05.674990 7ff186d69700 1 mds.0.46 handle_mds_map i am
now mds.0.46
2014-04-16 14:07:05.674996 7ff186d69700 1 mds.0.46 handle_mds_map state
change up:reconnect --> up:rejoin
2014-04-16 14:07:05.674998 7ff186d69700 1 mds.0.46 rejoin_start
2014-04-16 14:07:22.347521 7ff17f825700 0 -- 10.0.1.107:6815/17325 >>
10.0.1.68:0/4128280551 pipe(0x5e2ac80 sd=930 :6815 s=2 pgs=153 cs=1 l=0
c=0x5e2e160).fault with nothing to send, going to standby
Any ideas, how to solve "laggy or crashed" ?
Georg
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com