[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-3954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13727573#comment-13727573
 ] 

Lennert den Teuling edited comment on CLOUDSTACK-3954 at 8/2/13 1:10 PM:
-------------------------------------------------------------------------

I would like to add, that over a period of time a split-brain will eventually 
occur, even if one of the VMs allows ping. The VMs which allow ping will not be 
started on another host, VMs who don't will be restarted even when they are 
running. 

EDIT:
After looking futher into this, i think the issue is that we do not look if the 
hypervisor itself is pingable. If you would do this, this issue won't exist. If 
the hypervisor is pingable there should not be a HA, because we cannot make 
sure if the VMs are running or not. 


                
      was (Author: lennert):
    I would like to add, that over a period of time a split-brain will 
eventually occur, even if one of the VMs allows ping. The VMs which allow ping 
will not be started on another host, VMs who don't will be restarted even when 
they are running. 

EDIT:
After looking futher into this, i think the issue is that we do not look if the 
hypervisor itself is pingable. If you would do this, this issue won't exist. If 
the hypervisor is pingable there should not be a HA, because we cannot make 
sure if the VMs are running. 


                  
> HA with Security Groups and ping disabled will cause split-brian
> ----------------------------------------------------------------
>
>                 Key: CLOUDSTACK-3954
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-3954
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>          Components: KVM
>    Affects Versions: 4.1.0
>         Environment: Tested this with CS 4.1 on Ubuntu, but will probably 
> exist in other versions
>            Reporter: Lennert den Teuling
>            Priority: Critical
>
> We found out that when running CS 4.1 on KVM with Security Groups enabled + 
> ping disabled (default) will cause a split-brain when agent crashes. 
> How to reproduce:
> 1. Setup a Basic Zone with SG enabled
> 2. Create one or multiple  HA-enabled VMs with a security group which does 
> not allow ping (by default). 
> 3. Kill the agent on one of the hosts
> When you do this, the HA component on the management server will restart all 
> VMs on another node, even when they are running and the VM host is still 
> pingable. This will likely corrupt all VMs on the host where the agent was 
> stopped/killed. 
> We had some issues with libvirt causing the agent to disconnect. Luckily some 
> VMs allowed ping so nothing bad happened.  
> Temporary fix:
> Ensure at least one of the running VMs on each hosts allows ping, so the HA 
> manager will be able to ping it and will not HA the host. 
> I'm not sure yet why this happens, but wanted to file this bug so people can 
> take necessary preparations. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to