Lennert den Teuling created CLOUDSTACK-3954:
-----------------------------------------------
Summary: HA with Security Groups and ping disabled will cause
split-brian
Key: CLOUDSTACK-3954
URL: https://issues.apache.org/jira/browse/CLOUDSTACK-3954
Project: CloudStack
Issue Type: Bug
Security Level: Public (Anyone can view this level - this is the default.)
Components: KVM
Affects Versions: 4.1.0
Environment: Tested this with CS 4.1 on Ubuntu, but will probably
exist in other versions
Reporter: Lennert den Teuling
Priority: Critical
We found out that when running CS 4.1 on KVM with Security Groups enabled +
ping disabled (default) will cause a split-brain when agent crashes.
How to reproduce:
1. Setup a Basic Zone with SG enabled
2. Create one or multiple HA-enabled VMs with a security group which does not
allow ping (by default).
3. Kill the agent on one of the hosts
When you do this, the HA component on the management server will restart all
VMs on another node, even when they are running and the VM host is still
pingable. This will likely corrupt all VMs on the host where the agent was
stopped/killed.
We had some issues with libvirt causing the agent to disconnect. Luckily some
VMs allowed ping so nothing bad happened.
Temporary fix:
Ensure at least one of the running VMs on each hosts allows ping, so the HA
manager will be able to ping it and will not HA the host.
I'm not sure yet why this happens, but wanted to file this bug so people can
take necessary preparations.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira