Hi Martin,
Thanks for sharing. Alright, I'm not sure what's causing issue but based on the logs seems like only KVM agents are having issues while connecting to mgmt server as I don't see any Nio related exceptions in the management server logs. I could not see the cloudstack-agent version in the logs, I'm assuming that they were all upgraded to 4.9.0, and there are no conflicting jars at /usr/share/cloudstack-agent/lib. First, can you make sure mgmt server has enough ulimit. I found that Ubuntu/Debian's init.d script don't override this while CentOS initd/systemd script sets ulimit. On your mgmt server, edit /etc/init.d/cloudstack-management and add ulimit -n 10240 just before the mgmt server is started in the 'state' section (for me it was at around line #147 where it logs a message that it's starting the cloudstack-management server). Next, if this still does not solve the issue -- I created a special cloud-utils.jar for you that you need to place on your mgmt server and on the KVM agents and restart the mgmt server. This will increase verbosity of the error while reduce the Nio polling loop timeout (from 100ms to 10ms). On KVM agents, the error from the logs is that during SSL handshake inbound connection/stream gets closed, and we want to know the exception message. Please get the jar from here: https://github.com/rhtyd/cloudstack/releases/tag/4.9.0-nioinbound and place them at: /usr/share/cloudstack-agent/lib/ (on kvm host) /usr/share/cloudstack-management/webapps/client/WEB-INF/lib/ (on mgmt server host) Let me know what worked for you, and if it still failed can you share the mgmt server and agent logs once again. Thanks. Regards. ________________________________ From: martin kolly <martin.ko...@senselan.ch> Sent: 25 August 2016 20:50:08 To: dev@cloudstack.apache.org Subject: Re: CS 4.9 NIO Selector wait time PR-1601 Hi Rohit We are running java version 1.7.0.111 on KVM and management server. mgmt# java -version java version "1.7.0_111" kvm# java -version java version "1.7.0_111" We get the same error message. Attached are the logs with TRACE enabled. "success consists of going from failure to failure without loss of enthusiasm." regards martin On 08/25/2016 02:02 PM, Rohit Yadav wrote: Hi Martin, Thanks for sharing, on the surface there does not seem to be any issue in configuration causing the failures. I'm personally running KVM and Ubuntu hosts based env without issues, I'm on Ubuntu 14.04.4 (Linux bluebox 3.16.0-45-generic #60~14.04.1-Ubuntu) and java 1.7.0_79. Can you try upgrading your JRE7 to latest (openjdk-7-jre, 7u111-2.6.7-0ubuntu0.14.04.3) on all mgmt server and kvm hosts? If upgrading your JRE does not help, can you increase the logging verbosity for both the agent and management server (in /etc/cloudstack/{agent, management} there would be a log4j file, edit that and replace DEBUG/INFO with TRACE for class/keys com.cloud and org.apache.cloudstack) and re-share logs when the failures occur? I want to see what additional information we can get from logs when it tries to connect to host 10.100.12.10 on port: 8250. Regards. ________________________________ From: martin kolly <martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch> Sent: 25 August 2016 17:11:06 To: dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org> Subject: Re: CS 4.9 NIO Selector wait time PR-1601 @Simon: We have one management server with local DB. KVMs connect directly to the management server without any security/loadbalancing device. Thanks Martin On 08/25/2016 12:41 PM, Simon Weller wrote: Martin, Can you provide more detail about your haproxy setup? Are you running it on separate servers, or on the management server itself? - Si Simon Weller/ENA (615) 312-6068 rohit.ya...@shapeblue.comĀ www.shapeblue.com 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue -----Original Message----- From: martin kolly [martin.ko...@senselan.ch<mailto:martin.ko...@senselan.ch>] Received: Thursday, 25 Aug 2016, 5:04AM To: Rohit Yadav [rohit.ya...@shapeblue.com<mailto:rohit.ya...@shapeblue.com>]; dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org> [dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org>] Subject: Re: CS 4.9 NIO Selector wait time PR-1601 thanks for your reply. This morning we repeated the upgrade process from 4.8 to 4.9 with the following repository: http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/. <http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/><http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/> Unfortunately we run into the same issue: /2016-08-25 09:49:00,660 INFO [utils.nio.NioClient] (main:null) (logid:) Connecting to 10.100.12.10:8250// //2016-08-25 09:49:00,668 WARN [utils.nio.Link] (main:null) (logid:) This SSL engine was forced to close inbound due to end of stream.// //2016-08-25 09:49:00,668 ERROR [utils.nio.NioClient] (main:null) (logid:) SSL Handshake failed while connecting to host: 10.100.12.10 port: 8250// //2016-08-25 09:49:00,668 ERROR [utils.nio.NioConnection] (main:null) (logid:) Unable to initialize the threads.// //java.io.IOException: SSL Handshake failed while connecting to host: 10.100.12.10 port: 8250// // at com.cloud.utils.nio.NioClient.init(NioClient.java:67)// // at com.cloud.utils.nio.NioConnection.start(NioConnection.java:88)// // at com.cloud.agent.Agent.start(Agent.java:237)// // at com.cloud.agent.AgentShell.launchAgent(AgentShell.java:399)// // at com.cloud.agent.AgentShell.launchAgentFromClassInfo(AgentShell.java:367)// // at com.cloud.agent.AgentShell.launchAgent(AgentShell.java:351)// // at com.cloud.agent.AgentShell.start(AgentShell.java:456)// // at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)// // at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)// // at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)// // at java.lang.reflect.Method.invoke(Method.java:606)// // at org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:243)// //2016-08-25 09:49:00,669 INFO [utils.exception.CSExceptionErrorCode] (main:null) (logid:) Could not find exception: com.cloud.utils.exception.NioConnectionException in error code list for exceptions// //2016-08-25 09:49:00,669 WARN [cloud.agent.Agent] (main:null) (logid:) NIO Connection Exception com.cloud.utils.exception.NioConnectionException: SSL Handshake failed while connecting to host: 10.100.12.10 port: 8250// //2016-08-25 09:49:00,670 INFO [cloud.agent.Agent] (main:null) (logid:) Attempted to connect to the server, but received an unexpected exception, trying again.../ *KVM Hosts: */# java -version java version "1.7.0_95" OpenJDK Runtime Environment (IcedTea 2.6.4) (7u95-2.6.4-0ubuntu0.14.04.1) OpenJDK 64-Bit Server VM (build 24.95-b01, mixed mode) # dpkg --get-selections | grep -e 'jdk' -e 'java' ca-certificates-java install java-common install libcommons-daemon-java install openjdk-7-jre-headless:amd64 install tzdata-java install # apt-cache policy cloudstack-agent cloudstack-agent: Installed: 4.9.0 Candidate: 4.9.0 Version table: *** 4.9.0 0 500 http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/ ./ Packages 100 /var/lib/dpkg/status # find /usr/share/ -name "cloud-utils*.jar" /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar # md5sum /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar a8de7306d7c80b5a73e93b83afdd119f /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar /*Management Server: */# java -version// //java version "1.7.0_95"// //OpenJDK Runtime Environment (IcedTea 2.6.4) (7u95-2.6.4-0ubuntu0.14.04.1)// //OpenJDK 64-Bit Server VM (build 24.95-b01, mixed mode)// // //# dpkg --get-selections | grep -e 'jdk' -e 'java'// //ca-certificates-java install// //java-common install// //libcommons-collections3-java install// //libcommons-daemon-java install// //libcommons-dbcp-java install// //libcommons-pool-java install// //libecj-java install// //libgeronimo-jta-1.1-spec-java install// //libmysql-java install// //libservlet2.5-java install// //libtomcat6-java install// //openjdk-7-jre-headless:amd64 install// //tzdata-java install// // //# apt-cache policy cloudstack-management// //cloudstack-management:// // Installed: 4.9.0// // Candidate: 4.9.0// // Version table:// // *** 4.9.0 0// // 500 http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/ ./ Packages// // 100 /var/lib/dpkg/status/// /# find /usr/share/ -name "cloud-utils*.jar"// ///usr/share/cloudstack-management/webapps/client/WEB-INF/lib/cloud-utils-4.9.0.jar// ///usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar// ///usr/share/cloudstack-usage/lib/cloud-utils-4.9.0.jar// //# md5sum /usr/share/cloudstack-management/webapps/client/WEB-INF/lib/cloud-utils-4.9.0.jar// //a8de7306d7c80b5a73e93b83afdd119f /usr/share/cloudstack-management/webapps/client/WEB-INF/lib/cloud-utils-4.9.0.jar// //# md5sum /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar// //a8de7306d7c80b5a73e93b83afdd119f /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar// //# md5sum /usr/share/cloudstack-usage/lib/cloud-utils-4.9.0.jar// //a8de7306d7c80b5a73e93b83afdd119f /usr/share/cloudstack-usage/lib/cloud-utils-4.9.0.jar/ The classpath.conf was not modified: /# cat /etc/cloudstack/management/classpath.conf #!/bin/bash #... SYSTEMJARS="" SCP=$(build-classpath $SYSTEMJARS 2>/dev/null) ; if [ $? != 0 ] ; then export SCP="" ; fi MCP="" DCP="/usr/share/tomcat6/bin/bootstrap.jar:/usr/share/tomcat6/bin/tomcat-juli.jar" CLASSPATH=$SCP:$DCP:$MCP:/etc/cloudstack/management:/usr/share/cloudstack-management/setup for jarfile in ""/* ; do if [ ! -e "$jarfile" ] ; then continue ; fi CLASSPATH=$jarfile:$CLASSPATH done for plugin in ""/* ; do if [ ! -e "$plugin" ] ; then continue ; fi CLASSPATH=$plugin:$CLASSPATH done for vendorconf in "/etc/cloudstack/management"/vendor/* ; do if [ ! -d "$vendorconf" ] ; then continue ; fi CLASSPATH=$vendorconf:$CLASSPATH done export CLASSPATH if ([ -z "$JAVA_HOME" ] || [ ! -d "$JAVA_HOME" ]) && [ -d /usr/lib/jvm/jre-1.7.0 ]; then export JAVA_HOME=/usr/lib/jvm/jre-1.7.0 fi PATH=$JAVA_HOME/bin:/sbin:/usr/sbin:$PATH export PATH/ Regards Martin On 08/24/2016 06:56 PM, Rohit Yadav wrote: Martin, Were you able to fix your issue after installing packages from the repo Will shared and restarting the services? I've not personally tested the apt-get.eu repo, but I had earlier built this repo which I'm personally using in my local KVM-trusty based cloud: http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/ If you're still getting the error, can you share the JRE version you're running, both on the mgmt server and on the KVM hosts? You can run java -version, or share output of "dpkg --get-selections | grep -e 'jdk' -e 'java'". Are you running CloudStack with any additional plugins? >From the logs, looks like there are mixed jar files, NioConnectionException class was not found -- something's wrong with your installation. there must be a cloud-utils jar file make sure your installation don't have multiple copies/versions of jars (somewhere) in the in /usr/share/cloudstack-common and in /usr/share/cloudstack-management/webapps/client/ paths: Could not find exception: com.cloud.utils.exception.NioConnectionException in error code list for exceptions The error "Unable to initialize the threads." suggests, JVM was not able to spawn threads. I would like to know your JRE version and any other settings configured in /etc/cloudstack/management/classpath.conf (and there are bunch of other files where JAVA_OPTS might have been overridden). Note: For now you should only be using JRE1.7. Regards. rohit.ya...@shapeblue.com<mailto:rohit.ya...@shapeblue.com> www.shapeblue.com<http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com> @shapeblue ------------------------------------------------------------------------ *From:* martin kolly <martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch> *Sent:* 24 August 2016 19:53:26 *To:* dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org>; Rohit Yadav *Subject:* Re: CS 4.9 NIO Selector wait time PR-1601 Thanks Will! yes the repo is pointing to 4.9 release for all KVMs and for the management server: /cloudstack:~# cat /etc/apt/sources.list.d/cloudstack.list // //deb http://cloudstack.apt-get.eu/ubuntu trusty 4.9/ All KVM agents and the mgmt server are upgraded to release 4.9 based on the documentation.We have restarted all the cloudstack-agents and the cloudstack-management service as well. Network traces are showing packets from KVM <-> Mgmt on port 8250. there is no security device in between. thanks fanfarlo On 08/24/2016 04:13 PM, Will Stevens wrote: @rohit, I am guessing they should be installing the cloudstack-agent using the following repo right? That is what is described in the upgrade (trusty instead of precise though). http://cloudstack.apt-get.eu/ubuntu/dists/trusty/4.9/ @fanfarlo, are your repo's setup to point to the new 4.9 version? cheers, will On Wed, Aug 24, 2016 at 9:46 AM, Rohit Yadav <rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com> wrote: The PR and fix already exists in 4.9.0 release. Please make sure to upgrade all of your management server(s) and KVM agents and then also restart them after the upgrade. If you are seeing SSL handshake failures, it could be due to network or security issue and most likely due to mismatch between CloudStack mgmt server and KVM agent version. Regards. rohit.ya...@shapeblue.com<mailto:rohit.ya...@shapeblue.com> www.shapeblue.com<http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com> @shapeblue ------------------------------ *From:* Will Stevens <williamstev...@gmail.com><mailto:williamstev...@gmail.com> *Sent:* 24 August 2016 18:17:17 *To:* dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org>; Rohit Yadav *Subject:* Re: CS 4.9 NIO Selector wait time PR-1601 That PR is already merged, so you don't have to do anything to get that code, you already have it. @rohit, can you review this? I think this is a similar to the issue Simon reported earlier. Will On Aug 24, 2016 6:56 AM, "fanfarlo" <fanfar...@gmail.com><mailto:fanfar...@gmail.com> wrote: hi all We have the following environment: - OS: Debian 14.04 (hypervisors and management) - 4 KVM Hosts - Cloudstack Release 4.9 with local database Since we upgraded to Release 4.9 the KVM hosts no longer connect to the management Server. Upgrade procedure was followed as described: http://docs.cloudstack.apache.org/projects/cloudstack-releas e-notes/en/4.9.0/upgrade/upgrade-4.8.html On the KVM hosts we have the following error message: /2016-08-24 10:42:49,678 INFO [utils.exception.CSExceptionErrorCode] (main:null) (logid:) Could not find exception: com.cloud.utils.exception.NioConnectionException in error code list for exceptions 2016-08-24 10:42:49,678 WARN [cloud.agent.Agent] (main:null) (logid:) NIO Connection Exception com.cloud.utils.exception.NioConnectionException: SSL Handshake failed while connecting to host: 10.100.12.10 port: 8250 2016-08-24 10:42:49,678 INFO [cloud.agent.Agent] (main:null) (logid:) Attempted to connect to the server, but received an unexpected exception, trying again... 2016-08-24 10:42:54,679 INFO [utils.nio.NioClient] (main:null) (logid:) Connecting to 10.100.12.10:8250 2016-08-24 10:42:54,684 WARN [utils.nio.Link] (main:null) (logid:) This SSL engine was forced to close inbound due to end of stream. 2016-08-24 10:42:54,684 ERROR [utils.nio.NioClient] (main:null) (logid:) SSL Handshake failed while connecting to host: 10.100.12.10 port: 8250 2016-08-24 10:42:54,685 ERROR [utils.nio.NioConnection] (main:null) (logid:) Unable to initialize the threads. java.io.IOException: SSL Handshake failed while connecting to host: 10.100.12.10 port: 8250 at com.cloud.utils.nio.NioClient.init(NioClient.java:67) at com.cloud.utils.nio.NioConnection.start(NioConnection.java:88) at com.cloud.agent.Agent.start(Agent.java:237) at com.cloud.agent.AgentShell.launchAgent(AgentShell.java:399) at com.cloud.agent.AgentShell.launchAgentFromClassInfo(AgentShell.java:367) at com.cloud.agent.AgentShell.launchAgent(AgentShell.java:351) at com.cloud.agent.AgentShell.start(AgentShell.java:456) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce ssorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe thodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.commons.daemon.support.DaemonLoader.start(DaemonL oader.java:243) 2016-08-24 10:42:54,685 INFO [utils.exception.CSExceptionErrorCode] (main:null) (logid:) Could not find exception: com.cloud.utils.exception.NioConnectionException in error code list for exceptions 2016-08-24 10:42:54,685 WARN [cloud.agent.Agent] (main:null) (logid:) NIO Connection Exception com.cloud.utils.exception.NioConnectionException: SSL Handshake failed while connecting to host: 10.100.12.10 port: 8250 2016-08-24 10:42:54,686 INFO [cloud.agent.Agent] (main:null) (logid:) Attempted to connect to the server, but received an unexpected exception, trying again.../ Port is open on the management server, there is no firewall in between. We found that there was a bug report here: https://issues.apache.org/jira/browse/CLOUDSTACK-9348. There is a PR changing the NIO Selector wait time: https://github.com/apache/cloudstack/pull/1601 which was merged into master branch. Since we installed Release 4.9 we probably need to patch the NioConection.class as described in PR1601 , right? kvm03# unzip -v /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar | grep NioConnection 3923 Defl:N 1778 55% 2016-08-02 09:28 05aaf7d5 com/cloud/utils/nio/NioConnection$1.class 881 Defl:N 495 44% 2016-08-02 09:28 e378984c com/cloud/utils/nio/NioConnection$ChangeRequest.class 15410 Defl:N 7130 54% 2016-08-02 09:28 b3281f5a com/cloud/utils/nio/NioConnection.class 1134 Defl:N 584 49% 2016-08-02 09:28 8d5cb4a8 com/cloud/utils/exception/NioConnectionException.class Due to a lack of java expertise we have some basic questions: - Is there a patched jar file available ? public build server? - Do we need to create the jar from sources ? procedure? - How do we apply the patch ? many thanks! fanfarlo rohit.ya...@shapeblue.com<mailto:rohit.ya...@shapeblue.com> www.shapeblue.com<http://www.shapeblue.com> 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue