Hi, consider the following scenario:
- cluster with public and cluster networks - three node cluster - 5 osd per node - 1 mon per node - two node attached at the same 10GB switch - cluster network (room A) - one node attached to another 10GB switch - cluster network (room B) - no redundancy between 10GB switches cluster network - redundant public network (1GB) Cause: the 10GB switch (cluster network) in room A turns off (maintenance/power loss etc) Problem: only 4 of 5 osd declared down on the second node, 5 of 5 osd declared up on the first node. I/O on the clients stuck until manually turns off osd on first node. This is our ceph.conf configuration: ... > public network = 10.x.x.x/24 > cluster network = 172.x.x.x/24 ... > mon osd report timeout = 15 > mon osd down out interval = 600 > ... the doc <http://docs.ceph.com/docs/mimic/rados/configuration/network-config-ref/> says: If you declare a cluster network, OSDs will route heartbeat, object replication and recovery traffic over the cluster network. This may improve performance compared to using a single network. To configure a cluster network, add the following option to the [global] section of your Ceph configuration file. So, why ceph was not able to automatically turn off the isolated osd? Lorenzo -- Lorenzo Garuti CED MaxMara email: [email protected] tel: 0522 3993772 - 335 8416054
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
