[
https://issues.apache.org/jira/browse/CLOUDSTACK-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15448760#comment-15448760
]
ASF GitHub Bot commented on CLOUDSTACK-9458:
--------------------------------------------
Github user abhinandanprateek commented on the issue:
https://github.com/apache/cloudstack/pull/1640
@marcaurele @koushik-das When the MS thinks that the VM is down, it issues
a stop command. This is done to clear up the resources on management server db
tied up for that VM. Now it is seen several times that this actually kills a
healthy VM. I have seen this issue in MS cluster with agent.lb turned on.
The issue is that we do need a state cleanup when a running VM is found to
be stopped on the host. But this probably should not induce a shutdown on the
host ? really, but again this is a tricky boundary condition.
> Some VMs are being stopped when agent is reconnecting
> -----------------------------------------------------
>
> Key: CLOUDSTACK-9458
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9458
> Project: CloudStack
> Issue Type: Bug
> Security Level: Public(Anyone can view this level - this is the
> default.)
> Reporter: Marc-Aurèle Brothier
> Assignee: Marc-Aurèle Brothier
>
> If you loose the communication between the management server and one of the
> agent for a few minutes, even though HA mode is not active the
> HighAvailibilityManager kicks in and start to schedule vm restart. Those
> tasks are being inserted as async job in the DB and if the agent comes back
> online during the time the jobs are still in the async table, they are pushed
> to the agent and shuts down the VMs. Then since HA is not active, the VM are
> not restarted.
> The expected behavior in my opinion is that the VM should not be restarted at
> all if HA mode is not active on them, and let the agent update the VM state
> with the power report.
> The bug lies in
> {{HighAvailibilityManagerImpl.scheduleRestartForVmsOnHost(final HostVO host,
> boolean investigate)}}, PR will follow.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)