ASF GitHub Bot commented on CLOUDSTACK-9458:

Github user abhinandanprateek commented on the issue:

    @jburwell @koushik-das @marcaurele When MS is unable to determine the state 
of the VM, Or it thinks VM requires a HA operation then it issues a stop 
command as part of fence operation.
    The affect of this is to clean up the resources on the MS and keep the 
resource book keeping on MS in tack. This has the potential to kill a healthy 
VM in some boundary cases. We need to fix these boundary cases.
    In case this cleanup/fence operation does not happen on MS then the 
resource allocation on MS will be not in sync with the actual capacity causes 
further complications and issues.

> Some VMs are being stopped when agent is reconnecting
> -----------------------------------------------------
>                 Key: CLOUDSTACK-9458
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9458
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>            Reporter: Marc-Aurèle Brothier
>            Assignee: Marc-Aurèle Brothier
> If you loose the communication between the management server and one of the 
> agent for a few minutes, even though HA mode is not active the 
> HighAvailibilityManager kicks in and start to schedule vm restart. Those 
> tasks are being inserted as async job in the DB and if the agent comes back 
> online during the time the jobs are still in the async table, they are pushed 
> to the agent and shuts down the VMs. Then since HA is not active, the VM are 
> not restarted.
> The expected behavior in my opinion is that the VM should not be restarted at 
> all if HA mode is not active on them, and let the agent update the VM state 
> with the power report.
> The bug lies in 
> {{HighAvailibilityManagerImpl.scheduleRestartForVmsOnHost(final HostVO host, 
> boolean investigate)}}, PR will follow.

This message was sent by Atlassian JIRA

Reply via email to