[jira] [Created] (CLOUDSTACK-6857) Losing the connection from CloudStack Manager to the agent will force a shutdown when connection is re-established

c-hemp (JIRA) Fri, 06 Jun 2014 08:28:41 -0700

c-hemp created CLOUDSTACK-6857:
----------------------------------

             Summary: Losing the connection from CloudStack Manager to the 
agent will force a shutdown when connection is re-established
                 Key: CLOUDSTACK-6857
                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-6857
             Project: CloudStack
          Issue Type: Bug
      Security Level: Public (Anyone can view this level - this is the default.)
          Components: Management Server
    Affects Versions: 4.3.0
         Environment: Ubuntu 12.04
            Reporter: c-hemp
            Priority: Critical



If a physical host is not pingable that host goes into alert mode. If the 
physical hosts is unreachable, the virtual router is either unreachable or 
unable to ping a virtual on the physical host, and the manager is unable to 
ping the virtual instance it assumes the host is down and puts it into a stop 
state.  

When the connection is restablished, it gets the state from the database, sees 
that it is now in a stopped state, and will then shutdown the instance.

This behavior can cause major outages if there is any type of network loss once 
the connectivity comes back.  This is especially critical when using CloudStack 
across multiple colos.

The logs when it happens:
14-06-06 02:01:22,259 INFO  [c.c.h.HighAvailabilityManagerImpl] 
(HA-Worker-1:ctx-be848615 work-1953) PingInvestigator found 
VM[User|cephvmstage013]to be alive? null
2014-06-06 02:01:22,259 DEBUG [c.c.h.ManagementIPSystemVMInvestigator] 
(HA-Worker-1:ctx-be848615 work-1953) Not a System Vm, unable to determine state 
of VM[User|cephvmstage013] returning null
2014-06-06 02:01:22,259 DEBUG [c.c.h.ManagementIPSystemVMInvestigator] 
(HA-Worker-1:ctx-be848615 work-1953) Testing if VM[User|cephvmstage013] is alive
2014-06-06 02:01:22,260 DEBUG [c.c.h.ManagementIPSystemVMInvestigator] 
(HA-Worker-1:ctx-be848615 work-1953) Unable to find a management nic, cannot 
ping this system VM, unable to determine state of VM[User|cephvmstage013] 
returning null
2014-06-06 02:01:22,260 INFO  [c.c.h.HighAvailabilityManagerImpl] 
(HA-Worker-1:ctx-be848615 work-1953) ManagementIPSysVMInvestigator found 
VM[User|cephvmstage013]to be alive? null
2014-06-06 02:01:22,263 INFO  [c.c.h.HighAvailabilityManagerImpl] 
(HA-Worker-4:ctx-e8eea7fb work-1950) KVMInvestigator found 
VM[User|cephvmstage013]to be alive? null
2014-06-06 02:01:22,263 INFO  [c.c.h.HighAvailabilityManagerImpl] 
(HA-Worker-4:ctx-e8eea7fb work-1950) HypervInvestigator found 
VM[User|cephvmstage013]to be alive? null
2014-06-06 02:01:22,419 INFO  [c.c.h.HighAvailabilityManagerImpl] 
(HA-Worker-1:ctx-be848615 work-1953) KVMInvestigator found 
VM[User|cephvmstage013]to be alive? null
2014-06-06 02:01:22,419 INFO  [c.c.h.HighAvailabilityManagerImpl] 
(HA-Worker-1:ctx-be848615 work-1953) HypervInvestigator found 
VM[User|cephvmstage013]to be alive? null
2014-06-06 02:01:22,584 WARN  [c.c.v.VirtualMachineManagerImpl] 
(HA-Worker-1:ctx-be848615 work-1953) Unable to actually stop 
VM[User|cephvmstage013] but continue with release because it's a force stop
2014-06-06 02:01:22,585 DEBUG [c.c.v.VirtualMachineManagerImpl] 
(HA-Worker-1:ctx-be848615 work-1953) VM[User|cephvmstage013] is stopped on the 
host.  Proceeding to release resource held.
2014-06-06 02:01:22,648 WARN  [c.c.v.VirtualMachineManagerImpl] 
(HA-Worker-4:ctx-e8eea7fb work-1950) Unable to actually stop 
VM[User|cephvmstage013] but continue with release because it's a force stop
2014-06-06 02:01:22,650 DEBUG [c.c.v.VirtualMachineManagerImpl] 
(HA-Worker-4:ctx-e8eea7fb work-1950) VM[User|cephvmstage013] is stopped on the 
host.  Proceeding to release resource held.
2014-06-06 02:01:22,704 DEBUG [c.c.v.VirtualMachineManagerImpl] 
(HA-Worker-4:ctx-e8eea7fb work-1950) Successfully released network resources 
for the vm VM[User|cephvmstage013]
2014-06-06 02:01:22,704 DEBUG [c.c.v.VirtualMachineManagerImpl] 
(HA-Worker-4:ctx-e8eea7fb work-1950) Successfully released storage resources 
for the vm VM[User|cephvmstage013]
2014-06-06 02:01:22,774 DEBUG [c.c.v.VirtualMachineManagerImpl] 
(HA-Worker-1:ctx-be848615 work-1953) Successfully released network resources 
for the vm VM[User|cephvmstage013]
2014-06-06 02:01:22,774 DEBUG [c.c.v.VirtualMachineManagerImpl] 
(HA-Worker-1:ctx-be848615 work-1953) Successfully released storage resources 
for the vm VM[User|cephvmstage013]

The behavior should change to be set into an alert state, then once 
connectivity is re-established, if the instance is up, update the manager with 
the running status



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (CLOUDSTACK-6857) Losing the connection from CloudStack Manager to the agent will force a shutdown when connection is re-established

Reply via email to