[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508501#comment-16508501
 ] 

ASF GitHub Bot commented on CLOUDSTACK-10326:
---------------------------------------------

nvazquez opened a new pull request #2493: CLOUDSTACK-10326: Prevent hosts fall 
into Maintenance when there are running VMs on it
URL: https://github.com/apache/cloudstack/pull/2493
 
 
   JIRA Ticket: https://issues.apache.org/jira/browse/CLOUDSTACK-10326
   
   This issue was discovered, fixed and tested on KVM, but applies for every 
hypervisor.
   
   ### Background
   When enabling maintenance mode in a host, host state is put into 
'PrepareForMaintenance' and running VMs are migrated into another host. After 
every VM is migrated, host goes to 'Maintenance' state.
   
   Checks are performed on `ResourceManagerImpl.checkAndMaintan()` method:
   
   - List VMs with host_id = HOST_ID
   - List VMs with last_host_id = HOST_ID and state=Migrating
   
   When both queries are empty, then the host can be put into Maintenance.
   
   When a VM is being migrated to DEST_HOST, its host_id column is set to 
DEST_HOST, last_host_id = ORIGIN_HOST and state = Migrating. If then migration 
fails, host_id = last_host_id = ORIGIN_HOST 
   
   ### Issue
   This sequence:
   
   - Enable maintenance mode on ORIGIN_HOST
   - VMs start being migrated to a host, say DEST_HOST
   - checkAndMaintain() starts:
      - First check passes (no VM with host_id = ORIGIN_HOST_ID as those are 
being migrated)
      - Before the second check, one or more migrations fail
      - Second check passes, however there are VMs running on the host as 
migrations have failed.
   - Host goes into Maintenance state.
   
   Screenshots attached, query executed on each case:
   `select id, name, instance_name, state, host_id, last_host_id from 
vm_instance;`
   
   Before enabling maintenance mode on host 4:
   
![cloudstack-10326-initialstate](https://user-images.githubusercontent.com/5295080/37496971-54f6a482-2894-11e8-9976-5097434608b1.png)
   
   While host = 'PrepareForMaintenance' and VM is being migrated to host 1:
   
![cloudstack-10326-migrating](https://user-images.githubusercontent.com/5295080/37497029-a31d6a38-2894-11e8-8e7e-6df725b69252.png)
   
   At this point the first check is performed:
   
![cloudstack-10326-debug1](https://user-images.githubusercontent.com/5295080/37497097-fe6df646-2894-11e8-8dd5-a7e2a8869398.png)
   
   Made migrations fail adding these rules on host 4:
   ````
   iptables -I OUTPUT -j REJECT -m state --state NEW -m tcp -p tcp --dport 
49152:49215 -m comment --comment 'test block migrations'
   iptables -I OUTPUT -j REJECT -m state --state NEW -m tcp -p tcp --dport 
16509 -m comment --comment 'test block migrations'
   ````
   Migration failed and VM goes Running into host 4:
   
![cloudstack-10326-migrationfailed](https://user-images.githubusercontent.com/5295080/37497071-dcf8ee76-2894-11e8-8064-0870870ba422.png)
   
   Second check passes and host goes into Maintenance
   
![cloudstack-10326-debug](https://user-images.githubusercontent.com/5295080/37497109-0bc56ae0-2895-11e8-94e9-a00ed1195502.png)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Prevent hosts fall into Maintenance when there are running VMs on it
> --------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-10326
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-10326
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>    Affects Versions: 4.11.0.0
>            Reporter: Nicolas Vazquez
>            Assignee: Nicolas Vazquez
>            Priority: Major
>             Fix For: 4.11.1.0
>
>         Attachments: CLOUDSTACK-10326-Debug.png, 
> CLOUDSTACK-10326-InitialState.png, CLOUDSTACK-10326-Migrating.png, 
> CLOUDSTACK-10326-MigrationFailed.png
>
>
> This issue was discovered, fixed and tested on KVM, but applies for every 
> hypervisor.
> h2. Background
> When enabling maintenance mode in a host, host state is put into 
> 'PrepareForMaintenance' and running VMs are migrated into another host. After 
> every VM is migrated, host goes to 'Maintenance' state.
> Checks are performed on ResourceManagerImpl.checkAndMaintan() method:
>  * List VMs with host_id = HOST_ID
>  * List VMs with last_host_id = HOST_ID and state=Migrating
> When both queries are empty, then the host can be put into Maintenance.
> When a VM is being migrated to DEST_HOST, its host_id column is set to 
> DEST_HOST, last_host_id = ORIGIN_HOST and state = Migrating. If then 
> migration fails, host_id = last_host_id = ORIGIN_HOST 
> h2. Issue
> This sequence:
>  * Enable maintenance mode on ORIGIN_HOST
>  * VMs start being migrated to a host, say DEST_HOST
>  * checkAndMaintain() starts:
>  ** First check passes (no VM with host_id = ORIGIN_HOST_ID as those are 
> being migrated)
>  ** Before the second check, one or more migrations fail
>  ** Second check passes, however there are VMs running on the host as 
> migrations have failed.
>  * Host goes into Maintenance state.
> Screenshots attached, query executed on each case:
> select id, name, instance_name, state, host_id, last_host_id from vm_instance;



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to