Hi,

consider the following scenario:

   - cluster with public and cluster networks
   - three node cluster
   - 5 osd per node
   - 1 mon per node
   - two node attached at the same 10GB switch - cluster network (room A)
   - one node attached to another 10GB switch  - cluster network (room B)
   - no redundancy between 10GB switches cluster network
   - redundant public network (1GB)

Cause:

the 10GB switch (cluster network) in room A turns off (maintenance/power
loss etc)

Problem:

only 4 of 5 osd declared down on the second node, 5 of 5 osd declared up on
the first node.
I/O on the clients stuck until manually turns off osd on first node.

This is our ceph.conf configuration:

...
> public network = 10.x.x.x/24
> cluster network = 172.x.x.x/24

...
> mon osd report timeout = 15
> mon osd down out interval = 600
> ...


the doc
<http://docs.ceph.com/docs/mimic/rados/configuration/network-config-ref/>
says:

If you declare a cluster network, OSDs will route heartbeat, object
replication and recovery traffic over the cluster network. This may improve
performance compared to using a single network. To configure a cluster
network, add the following option to the [global] section of your Ceph
configuration file.

So, why ceph was not able to automatically turn off the isolated osd?

Lorenzo
-- 
Lorenzo Garuti
CED MaxMara
email: [email protected]
tel: 0522 3993772 - 335 8416054
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to