[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15425993#comment-15425993
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9458:
--------------------------------------------

Github user koushik-das commented on the issue:

    https://github.com/apache/cloudstack/pull/1640
  
    Please use a proper ACS release for reporting bugs. In your case you may 
have to do some additional cherry-picks.
    
    "Schedule restart" does multiple tasks. There is a method by this name in 
code, the name may not be the most appropriate but it does the following. So 
don't get confused with the name
    1. Tries to find out if the VM is alive or not
    2. If it is not able to determine conclusively if VM is alive, then it 
tries to fence off VM
    3. After successful fencing, HA enabled VMs are restarted on another host, 
non-HA VMs are marked as Stopped
    
    So as you see non-HA VMs are simply stopped when the host is determined as 
down and not restarted. It makes sense to mark them as stopped so that 
subsequent operations can be performed on the VMs, for e.g. selective VMs may 
be explicitly started on another host. If a host is down then power sync won't 
happen for that host and VM states on that host won't get updated.


> Some VMs are being stopped when agent is reconnecting
> -----------------------------------------------------
>
>                 Key: CLOUDSTACK-9458
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9458
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>            Reporter: Marc-Aurèle Brothier
>            Assignee: Marc-Aurèle Brothier
>
> If you loose the communication between the management server and one of the 
> agent for a few minutes, even though HA mode is not active the 
> HighAvailibilityManager kicks in and start to schedule vm restart. Those 
> tasks are being inserted as async job in the DB and if the agent comes back 
> online during the time the jobs are still in the async table, they are pushed 
> to the agent and shuts down the VMs. Then since HA is not active, the VM are 
> not restarted.
> The expected behavior in my opinion is that the VM should not be restarted at 
> all if HA mode is not active on them, and let the agent update the VM state 
> with the power report.
> The bug lies in 
> {{HighAvailibilityManagerImpl.scheduleRestartForVmsOnHost(final HostVO host, 
> boolean investigate)}}, PR will follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to