[
https://issues.apache.org/jira/browse/CLOUDSTACK-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nicolas Vazquez updated CLOUDSTACK-10326:
-----------------------------------------
Attachment: CLOUDSTACK-10326-MigrationFailed.png
CLOUDSTACK-10326-Migrating.png
CLOUDSTACK-10326-InitialState.png
CLOUDSTACK-10326-Debug.png
> Prevent hosts fall into Maintenance when there are running VMs on it
> --------------------------------------------------------------------
>
> Key: CLOUDSTACK-10326
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-10326
> Project: CloudStack
> Issue Type: Bug
> Security Level: Public(Anyone can view this level - this is the
> default.)
> Affects Versions: 4.11.0.0
> Reporter: Nicolas Vazquez
> Assignee: Nicolas Vazquez
> Priority: Major
> Fix For: 4.11.1.0
>
> Attachments: CLOUDSTACK-10326-Debug.png,
> CLOUDSTACK-10326-InitialState.png, CLOUDSTACK-10326-Migrating.png,
> CLOUDSTACK-10326-MigrationFailed.png
>
>
> This issue was discovered, fixed and tested on KVM, but applies for every
> hypervisor.
> h2. Background
> When enabling maintenance mode in a host, host state is put into
> 'PrepareForMaintenance' and running VMs are migrated into another host. After
> every VM is migrated, host goes to 'Maintenance' state.
> Checks are performed on ResourceManagerImpl.checkAndMaintan() method:
> * List VMs with host_id = HOST_ID
> * List VMs with last_host_id = HOST_ID and state=Migrating
> When both queries are empty, then the host can be put into Maintenance.
> When a VM is being migrated to DEST_HOST, its host_id column is set to
> DEST_HOST, last_host_id = ORIGIN_HOST and state = Migrating. If then
> migration fails, host_id = last_host_id = ORIGIN_HOST
> h2. Issue
> This sequence:
> * Enable maintenance mode on ORIGIN_HOST
> * VMs start being migrated to a host, say DEST_HOST
> * checkAndMaintain() starts:
> ** First check passes (no VM with host_id = ORIGIN_HOST_ID as those are
> being migrated)
> ** Before the second check, one or more migrations fail
> ** Second check passes, however there are VMs running on the host as
> migrations have failed.
> * Host goes into Maintenance state.
> Screenshots attached, query executed on each case:
> select id, name, instance_name, state, host_id, last_host_id from vm_instance;
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)