GitHub user borisroman opened a pull request: https://github.com/apache/cloudstack/pull/863
[BLOCKER][4.6]CLOUDSTACK-8883: Resolved connect/reconnect issue. Hi! @wilderrodrigues by implementing Callable you switched a couple of methods and fields. I switched them some more! The reason why the Agent wouldn't reconnect was due to two facts. Problem 1: Selector was blocking. In the while loop at [1] _selector.select(); was blocking when the connection was lost. This means at [2] _isStartup = false; was never excecuted. Therefore at [3] the call to isStartup() always returned true resulting in an infinite loop. Resolution 1: Move the call to cleanUp() [4] before checking if isStartup() has turned to false. cleanUp() will close() the _selector resulting in _isStartup to be set to false. Problem 2: Setting _isStartup & _isRunning to true when init() throwed an unchecked exception (ConnectException). The exception was nicely caught, but only logged. No action was taken! Resulting in _isStartup & _isRunning being set to true. Resulting in the fact the Agent thought it was connected successfully, though it wasn't. Resolution 2: Adding return to the catch statement [5]. This way _isStartup & _isRunning aren't set to true. Steps to test: 1. Deploy ACS. 2. Try all combinations of stopping/starting managment server/agent. [1]https://github.com/borisroman/cloudstack/blob/b34f86c8d55a1cfc057585eab4db0fa2d98a7b3e/utils/src/main/java/com/cloud/utils/nio/NioConnection.java#L128 [2]https://github.com/borisroman/cloudstack/blob/b34f86c8d55a1cfc057585eab4db0fa2d98a7b3e/utils/src/main/java/com/cloud/utils/nio/NioConnection.java#L176 [3]https://github.com/borisroman/cloudstack/blob/b34f86c8d55a1cfc057585eab4db0fa2d98a7b3e/agent/src/com/cloud/agent/Agent.java#L404 [4]https://github.com/borisroman/cloudstack/blob/b34f86c8d55a1cfc057585eab4db0fa2d98a7b3e/agent/src/com/cloud/agent/Agent.java#L399 [5]https://github.com/borisroman/cloudstack/blob/b34f86c8d55a1cfc057585eab4db0fa2d98a7b3e/utils/src/main/java/com/cloud/utils/nio/NioConnection.java#L91 You can merge this pull request into a Git repository by running: $ git pull https://github.com/borisroman/cloudstack CLOUDSTACK-8883 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/cloudstack/pull/863.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #863 ---- commit 9693b97c2147b3fdb9579a1ebb33597cd3bf1d11 Author: Boris Schrijver <bo...@pcextreme.nl> Date: 2015-09-21T14:54:56Z Call cleanUp() before looping isStartup(). commit b34f86c8d55a1cfc057585eab4db0fa2d98a7b3e Author: Boris Schrijver <bo...@pcextreme.nl> Date: 2015-09-21T22:38:16Z Added return statement to stop start() if there has been an ConnectException. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---