rhtyd edited a comment on issue #3505: Agent LB for CloudStack failed URL: https://github.com/apache/cloudstack/issues/3505#issuecomment-515899515 From email thread on dev@: ``` 2019-07-18 15:26:23,420 INFO [utils.nio.NioClient] (Agent-Handler-2:null) (logid:) Connecting to 172.17.1.142:8250 2019-07-18 15:26:26,427 ERROR [utils.nio.NioConnection] (Agent-Handler-2:null) (logid:) Unable to initialize the threads. java.net.NoRouteToHostException: No route to host at sun.nio.ch.Net.connect0(Native Method) at sun.nio.ch.Net.connect(Net.java:454) at sun.nio.ch.Net.connect(Net.java:446) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) at com.cloud.utils.nio.NioClient.init(NioClient.java:56) at com.cloud.utils.nio.NioConnection.start(NioConnection.java:95) at com.cloud.agent.Agent.reconnect(Agent.java:517) at com.cloud.agent.Agent$ServerHandler.doTask(Agent.java:1091) at com.cloud.utils.nio.Task.call(Task.java:83) at com.cloud.utils.nio.Task.call(Task.java:29) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-07-18 15:26:26,432 INFO [utils.exception.CSExceptionErrorCode] (Agent-Handler-2:null) (logid:) Could not find exception: com.cloud.utils.exception.NioConnectionException in error code list for exceptions 2019-07-18 15:26:26,432 WARN [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) NIO Connection Exception com.cloud.utils.exception.NioConnectionException: No route to host 2019-07-18 15:26:26,432 INFO [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Attempted to connect to the server, but received an unexpected exception, trying again... 2019-07-18 15:26:26,432 INFO [utils.nio.NioClient] (Agent-Handler-2:null) (logid:) NioClient connection closed 2019-07-18 15:26:31,433 INFO [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Reconnecting to host:172.17.1.141 ``` The exception is thrown after attempting for 3mins which is a reasonable timeout, next after it decides to reconnect it sleeps for up to 3 minutes depending on the configured backoff/sleep algorithm and sleeps: https://github.com/apache/cloudstack/blob/master/agent/src/main/java/com/cloud/agent/Agent.java#L528 From the logs, it seems the KVM host was disconnected from the managements server host for only `6mins` and not 15mins. I think it's perfectly reasonable to wait for few mins before kvm agent decides to switch, the instantaneous switching between mgmt server without proper socket and sleep timeouts can cause a large number of ownership switches and mgmt traffic.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
