Hello guys! I would like to sugest a few changes to Fuel HA/scalability features.
1. [HA] Ensure public/management VIP is running on node where HAproxy is working. Now if HAproxy dies, VIP is not moved to another node in a cluster. Simple way to check this is (HAProxy can die after segfault, wrong config, uninstalled package...): # echo deadbeef >> /etc/haproxy/haproxy.cfg # /etc/init.d/haproxy stop What happens: - Corosync can not start HAproxy - Corosync will NOT move VIP to another node - ALL connections to VIPs got 'connection refused' What should happen: - Corosync can not start HAproxy - Corosync will move VIP to another node Gerrit change: http://gerrit.vm.mirantis.net:8080/#/c/15617/ Now ocf:mirantis:haproxy check only if haproxy is running, in future we can implement more sophisticated health checks (backend timeouts, current connections limit...) 2. [HA] Tune TCP keepalive sysctl. Now we use default ubuntu/centos value (7200+9*75). This mean kernel will notice ‘silent’ (not RST, not FIN) connection failure after >2h. From my experience good value for HA systems is 180s: net.ipv4.tcp_keepalive_time = 120 net.ipv4.tcp_keepalive_probes = 3 net.ipv4.tcp_keepalive_intvl = 20 Gerrit change: http://gerrit.vm.mirantis.net:8080/#/c/15618/ 3. [Scalability] shuffle amqp nodes in Openstack configs. Now each Openstack node (compute, cinder, ...) connect to #1 controller, after failure it reconnects to #2, after that to #3 controller. In this case, ALL AMQP traffic is served by #1. We can shuffle 'rabbit_hosts' on each node. Gerrit change: http://gerrit.vm.mirantis.net:8080/#/c/15619/ Best Regards, Bartosz Kupidura -- Mailing list: https://launchpad.net/~fuel-dev Post to : [email protected] Unsubscribe : https://launchpad.net/~fuel-dev More help : https://help.launchpad.net/ListHelp

