[jira] [Commented] (CLOUDSTACK-9458) Some VMs are being stopped when agent is reconnecting

ASF GitHub Bot (JIRA) Mon, 21 Nov 2016 07:00:33 -0800

    [ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15683739#comment-15683739
 ]


ASF GitHub Bot commented on CLOUDSTACK-9458:
--------------------------------------------

Github user marcaurele commented on the issue:

    https://github.com/apache/cloudstack/pull/1640
  
    To get back to your previous comment @koushik-das on the broken scenario: 
what happen if the host is not reachable and the VMs are using a remote 
storage. With the fencing operation marking the VM as stopped, does it mean 
that the same remote disk volume is used if the VM is spawned on another host 
(while the other one still running on the first host)?
    
    @abhinandanprateek if the reason to fence off the VM is to clean up 
resources, IMO this should be the job of the VM sync, on the ping 
command/startup command. In case a host is lost, the capacity of the cluster 
should reflect the lose of that host and the stat capacity should calculate its 
value based on the hosts that are Up only. When a host comes back (possibly 
with some VMs still running), the startup command should sync the VM states and 
the capacity of the cluster/zone should be updated. 
    In short, cleaning up resources that are not "reachable" anymore should not 
be needed and should not be taken into account when calculating the actual 
capacity of the cluster/zone.


> Some VMs are being stopped when agent is reconnecting
> -----------------------------------------------------
>
>                 Key: CLOUDSTACK-9458
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9458
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>            Reporter: Marc-Aurèle Brothier
>            Assignee: Marc-Aurèle Brothier
>
> If you loose the communication between the management server and one of the 
> agent for a few minutes, even though HA mode is not active the 
> HighAvailibilityManager kicks in and start to schedule vm restart. Those 
> tasks are being inserted as async job in the DB and if the agent comes back 
> online during the time the jobs are still in the async table, they are pushed 
> to the agent and shuts down the VMs. Then since HA is not active, the VM are 
> not restarted.
> The expected behavior in my opinion is that the VM should not be restarted at 
> all if HA mode is not active on them, and let the agent update the VM state 
> with the power report.
> The bug lies in 
> {{HighAvailibilityManagerImpl.scheduleRestartForVmsOnHost(final HostVO host, 
> boolean investigate)}}, PR will follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CLOUDSTACK-9458) Some VMs are being stopped when agent is reconnecting

Reply via email to