All,

Historically, when the agent (kvm, ssvm, cpvm) is disconnected from the 
management server (say due to mgmt server restart etc), the reconnection logic 
waits for any pending tasks/commands to complete before reconnection attempts 
are made. I tried to search git history but could not find a reason, can anyone 
share why we may need this?


Based on the reported issue:

https://github.com/apache/cloudstack/issues/2633


I've a working patch which removes this limitation:

https://github.com/apache/cloudstack/pull/2638


>From testing with various combinations of tasks, I found that when that 
>happens even if the pending task succeeds it fails to send an Answer to the 
>mgmt server, therefore from the control plane's perspective that task is still 
>pending/on-going.


When the mgmt server comes back online, and the agent finally reconnects 
(pending on how long the pending task took) the executed operation is still 
pending in mgmt server's view and may sometimes require manual cleanups in 
database. By removing the limitation in above PR, at least the agent reconnects 
faster while of the failure/fault behaviours remain the same. A bigger design 
fix would be to make management server asynchronous of agent side 
answer/response handling.


- Rohit

<https://cloudstack.apache.org>



rohit.ya...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 

Reply via email to