[jira] [Commented] (CLOUDSTACK-3367) When one primary storage fails, all XenServer hosts get rebooted, killing all VMs, even those not on this primary storage.

Alex Huang (JIRA) Fri, 26 Jul 2013 18:50:47 -0700

    [ 
https://issues.apache.org/jira/browse/CLOUDSTACK-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721491#comment-13721491
 ]


Alex Huang commented on CLOUDSTACK-3367:
----------------------------------------

Our experience in testing this with the 5.6 version of XenServer is that if we 
attempt to stop the VMs with XenServer while the storage is out, that XenServer 
may not shut them down cleanly due to storage problems, leading to further 
problems down the road.  It's the reason why we chose to reboot instead of stop 
VMs.  

You also have to consider how often this happens.  If a storage server needs to 
be taken out, the storage server should be put in maintenance mode which 
shutdown the vms.  In that case, then it won't cause host to reboot.  
Therefore, this can only happen with an unscheduled outage of the storage 
server.

We can add a few things to make this happen less often.

- Don't put a heartbeat on the storage until a VM using that storage is on a 
host.
- Remove the heartbeat on the storage when all VMs using that storage is done.
- Try to stop the VMs within a short interval and if by that interval we can't 
stop the VMs, then reboot.
                
> When one primary storage fails, all XenServer hosts get rebooted, killing all 
> VMs, even those not on this primary storage.
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-3367
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-3367
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>          Components: Management Server, XenServer
>    Affects Versions: 4.1.0, 4.2.0
>         Environment: CentOS 6.3, XenServer 6.0.2 + all hotfixes, CloudStack 
> 4.1.0
>            Reporter: France
>            Priority: Critical
>             Fix For: Future
>
>
> As the title says: if only one of the primary storages fails, all XenServer 
> hosts get rebooted one by one. Because i have many primary storages, which 
> are/were running fine with other VMs, rebooting XenServer Hipervisor is an 
> overkill. Please disable this or implement just stopping/killing the VMs 
> running on that storage and try to re-attach that storage only.
> Problem was reported on the mailing list, as well as a workaround for 
> XenServer. So i'm not the only one hit by this "bug/feature". Workaround for 
> now is as follows:
> 1. Modify /opt/xensource/bin/xenheartbeat.sh on all your Hosts, commenting 
> out the two entries which have "reboot -f"
> 2. Identify the PID of the script  - pidof -x xenheartbeat.sh
> 3. Restart the Script  - kill <pid>
> 4. Force reconnect Host from the UI,  the script will then re-launch on 
> reconnect

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CLOUDSTACK-3367) When one primary storage fails, all XenServer hosts get rebooted, killing all VMs, even those not on this primary storage.

Reply via email to