Likitha Shetty created CLOUDSTACK-7415:
------------------------------------------
Summary: Host remains in Alert after vCenter restart
Key: CLOUDSTACK-7415
URL: https://issues.apache.org/jira/browse/CLOUDSTACK-7415
Project: CloudStack
Issue Type: Bug
Security Level: Public (Anyone can view this level - this is the default.)
Components: Management Server
Affects Versions: 4.0.0
Reporter: Likitha Shetty
Assignee: Likitha Shetty
Priority: Critical
Fix For: 4.5.0
In a clustered management server environment, after a vCenter restart some
hosts repeatedly go back into alert state even after the vCenter comes up.
Root caused the issue to the below race condition -
There is a scheduled PingTask that is run for every host and the interval at
which it is run is configurable (global config - ping.interval). When vCenter
gets restarted, PingTask is unable to get the host status and so it schedules
another task to handle the disconnect for the host agent.
This disconnect task determines the host status by sending CheckHeathCommand to
the agent. When the command returns an answer that says the resource is not
alive, CS performs further investigations and in this case VMware investigator
confirms the host to be in disconnected state. After which disconnect is
processed which involves the following -
1. Cancel all scheduled tasks for that agent which includes PingTask
2. Send disconnect to all listeners including AgentMonitor which clears the
agent from MS's PingMap
If the above disconnect takes a while to get scheduled and spills over to the
next PingTask interval, then the next PingTask runs wherein if by now the
vCenter is Up and host is connected the Ping is successful and hence an entry
for the agent is made in the PingMap.
Once an entry is made in the PingMap after a disconnect, every minute the
AgentMonitor task will run to find the agent behind on Ping, disconnect host
agent without investigation because the attache is no longer connected and put
the host back into Alert state.
--
This message was sent by Atlassian JIRA
(v6.2#6252)