Hi!

I have ceph cluster with 3 nodes with mon/mgr/mds servers.
I reboot one node and see this in client log:

Feb 09 20:29:14 ceph-nfs1 kernel: libceph: mon2 10.5.105.40:6789 socket closed 
(con state OPEN)
Feb 09 20:29:14 ceph-nfs1 kernel: libceph: mon2 10.5.105.40:6789 session lost, 
hunting for new mon
Feb 09 20:29:14 ceph-nfs1 kernel: libceph: mon0 10.5.105.34:6789 session 
established
Feb 09 20:29:22 ceph-nfs1 kernel: libceph: mds0 10.5.105.40:6800 socket closed 
(con state OPEN)
Feb 09 20:29:23 ceph-nfs1 kernel: libceph: mds0 10.5.105.40:6800 socket closed 
(con state CONNECTING)
Feb 09 20:29:24 ceph-nfs1 kernel: libceph: mds0 10.5.105.40:6800 socket closed 
(con state CONNECTING)
Feb 09 20:29:24 ceph-nfs1 kernel: libceph: mds0 10.5.105.40:6800 socket closed 
(con state CONNECTING)
Feb 09 20:29:53 ceph-nfs1 kernel: ceph: mds0 reconnect start
Feb 09 20:29:53 ceph-nfs1 kernel: ceph: mds0 reconnect success
Feb 09 20:30:05 ceph-nfs1 kernel: ceph: mds0 recovery completed

As I understand it, the following has happened:
1. Client detects - link with mon server broken and fast switches to another 
mon (less that 1 seconds).
2. Client detects - link with mds server broken, 3 times trying reconnect 
(unsuccessful), waiting and reconnects to the same mds after 30 seconds 
downtime.

I have 2 questions:
1. Why?
2. How to reduce switching time to another mds?
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to