[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-7415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108874#comment-14108874
 ] 

ASF subversion and git services commented on CLOUDSTACK-7415:
-------------------------------------------------------------

Commit 8ce6eba549bcd3fa007aaf10a29c3a2fef9ffaaa in cloudstack's branch 
refs/heads/master from [~likithas]
[ https://git-wip-us.apache.org/repos/asf?p=cloudstack.git;h=8ce6eba ]

CLOUDSTACK-7415. Host remains in Alert after vCenter restart.
Management server PingTask should update PingMap entry for an agent only if it 
is already present in the Management Server's PingMap.


> Host remains in Alert after vCenter restart
> -------------------------------------------
>
>                 Key: CLOUDSTACK-7415
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-7415
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>          Components: Management Server
>    Affects Versions: 4.0.0
>            Reporter: Likitha Shetty
>            Assignee: Likitha Shetty
>            Priority: Critical
>             Fix For: 4.5.0
>
>
> In a clustered management server environment, after a vCenter restart some 
> hosts repeatedly go back into alert state even after the vCenter comes up.
> Root caused the issue to the below race condition - 
> There is a scheduled PingTask that is run for every host and the interval at 
> which it is run is configurable (global config - ping.interval). When vCenter 
> gets restarted, PingTask is unable to get the host status and so it schedules 
> another task to handle the disconnect for the host agent.
> This disconnect task determines the host status by sending CheckHeathCommand 
> to the agent. When the command returns an answer that says the resource is 
> not alive, CS performs further investigations and in this case VMware 
> investigator confirms the host to be in disconnected state. After which 
> disconnect is processed which involves the following - 
> 1. Cancel all scheduled tasks for that agent which includes PingTask
> 2. Send disconnect to all listeners including AgentMonitor which clears the 
> agent from MS's PingMap
> If the above disconnect takes a while to get scheduled and spills over to the 
> next PingTask interval, then the next PingTask runs wherein if by now the 
> vCenter is Up and host is connected the Ping is successful and hence an entry 
> for the agent is made in the PingMap.
> Once an entry is made in the PingMap after a disconnect, every minute the 
> AgentMonitor task will run to find the agent behind on Ping, disconnect host 
> agent without investigation because the attache is no longer connected and 
> put the host back into Alert state.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to