Thanks for sharing Martin, I'm glad it worked for you. Regards. ________________________________ From: martin kolly <martin.ko...@senselan.ch> Sent: 13 September 2016 00:52:49 To: dev@cloudstack.apache.org; Rohit Yadav Subject: Re: CS 4.9 NIO Selector wait time PR-1601
We could fix the issue by generating new pairs of ssh keys. Here the procedure we applied: 1) Stop Management Server 2) Delete SSH Keys in mysql Database: delete from configuration where name = "ssh.publickey" ; delete from configuration where name = "ssh.privatekey" ; 3) Delete the SSH Keys rm /var/lib/cloudstack/management/.ssh/id_rsa.pub rm /var/lib/cloudstack/management/.ssh/id_rsa 4) Start the Management Server - SSH Keys are generated and mysql entries inserted # grep ssh-keygen /var/log/cloudstack/management/management-server.log 2016-09-12 20:47:40,000 DEBUG [c.c.u.s.Script] (main:null) (logid:) Executing: /bin/bash -c if [ -f /var/lib/cloudstack/management/.ssh/id_rsa ]; then rm -f /var/lib/cloudstack/management/.ssh/id_rsa; fi; ssh-keygen -t rsa -N '' -f /var/lib/cloudstack/management/.ssh/id_rsa -q 2016-09-12 20:54:23,835 DEBUG [c.c.u.s.Script] (main:null) (logid:) Executing: /bin/bash -c if [ -f /var/lib/cloudstack/management/.ssh/id_rsa ]; then rm -f /var/lib/cloudstack/management/.ssh/id_rsa; fi; ssh-keygen -t rsa -N '' -f /var/lib/cloudstack/management/.ssh/id_rsa -q Now works like a charm! thanks and regards martin On 09/12/2016 04:22 PM, martin kolly wrote: Hi Rohit Finally we had some time to dig deeper into this issue. The /bin/bash issue was a typo on our side, sorry! System VM was destroyed/recreated several times. But something with the ssh keys is still wrong. The KVM is still asking for the pass phrase when ssh into system vms. # Management Server mysql> select * from configuration where name = "ssh.publickey" \G; *************************** 1. row *************************** category: Hidden instance: DEFAULT component: management-server name: ssh.publickey value: XiVYy4PpuS4nKoQZRsj+3VQr0dxXFt+s15ZVXiIbaacJ5RsLwlSEO75kbPaL7pDAhyZbcCtC3ICSDUAeJysZ958cTVrl390Lk0UtRoLOUKlNIuOEVzdJpZDmsrm90ZK+OYMaHKf2cVz3xekZv/BS5Kpr2dnieUNZlQlgCXgldwENNZ7+MSlPRLiJ+7/SEBtfn9VTUkXvedhbBtIh3fxexOR/xN+zGRTyotlYANxiypklrlZp6/364JfXoeQviE6aVzsVDboALW+ov6gFj8ziO7vhoUN9M6rJohGnPawjgMCBasVCjWqoFaezKlf/0vEDf3NkKOoi1kqvSDGOumCFyQ7+9BOBq8qdqnP3i4e4aJ3vLdVWlRKRaUps08xOP8+j36L9lrO0YAkGXH+bAQXzQMqIFLvjRGX+zOnEikRQ2Zv0JJSlAvftF13BQrTt1SV/ttDa0qo7fzqF1WUijZb/PcKLGUeC30U9kG4JLmrHcuaoe43I08qkGJEofUetZGQLIYhsUqXiVSflGEGfJTWJgMGKAVbGUF5a description: Public key for the entire CloudStack default_value: NULL updated: NULL scope: NULL is_dynamic: 0 1 row in set (0.00 sec) mysql> select * from configuration where name = "ssh.privatekey" \G; *************************** 1. row *************************** category: Hidden instance: DEFAULT component: management-server name: ssh.privatekey value: j+Z5ckPXUq7lwo2L4OYaaiDOA/fdUpadX3h7E/DA1rwPRSz967eckCbRRpCtCWxfQjtAgPNfLgGIQHX4K7o9yF9DBXG0ESXoECyrRAOTmfZdvdSZceRGd9TYcjwCAn4s8GKGQRKeQSQLL8Pusc/PmYLISIomkOHhZB3lp1bfniKC6uf6RwhRFNfX38ZeqVhRNgagK2hxOo67Ev7uUX1NIUWw/ROq/RC7bG58DD1uhUOX5Bm5wVMSmf3Pv7s/z2t7ZGcX8YMnW ..... .... .... .... /uPooWhUQpcG8Q8APvfEKMp0BVJATh4Gk4qkgnR4CHyefUYCDHmWZnpuCj1ra7Kn5LV7AdWG1+C7bl7hSmpxE8StLfEwe72zuPJsI86LekxhJJpsorjvSFYklJhGizSs4uuyRP2Dqayx1qFkFJe26AiWb86ld2gQ2Vwj3ozgHX2OMp60jQM8ZVM9VuroOEmiOZ5gkfhFFIf5Y78WcZU2ScSCAHVavNAIsfX+i01iNxwnXtrmrA9ctlZGuZZRR3TklAXQz4M69SFFYuo5e7OsJ2/Tr9K7AhUHA8kwR+BxPLFGRlJpyhmLUuPartDKRBAp2BwU1XsFiIs8Y6pn9wkRJaeeDdpUt9zPCYMHaHtXQvFMe+d96J4HMPvfLgGnd55Hz7GKsglX9su7XpuAqzkuPM5u6GY5LLAtXUOqE2Wm855sLRW1kbHnX8pAfIe9wQbfUQ9wsfCh66w1duce0vyuj4A7A8HC4V8+U2qXjwMIhLpfO5phfUy4drlUzBzx9ZMK+JUYP6kszj61M09psoYggUBXMYCfg6oWJSjA8= description: Private key for the entire CloudStack default_value: NULL updated: NULL scope: NULL is_dynamic: 0 1 row in set (0.00 sec) # Secondary Storage VM (note that the key is equal to the management server database entry ssh.publickey) root@s-252-VM:~# cat /root/.ssh/authorized_keys XiVYy4PpuS4nKoQZRsj+3VQr0dxXFt+s15ZVXiIbaacJ5RsLwlSEO75kbPaL7pDAhyZbcCtC3ICSDUAeJysZ958cTVrl390Lk0UtRoLOUKlNIuOEVzdJpZDmsrm90ZK+OYMaHKf2cVz3xekZv/BS5Kpr2dnieUNZlQlgCXgldwENNZ7+MSlPRLiJ+7/SEBtfn9VTUkXvedhbBtIh3fxexOR/xN+zGRTyotlYANxiypklrlZp6/364JfXoeQviE6aVzsVDboALW+ov6gFj8ziO7vhoUN9M6rJohGnPawjgMCBasVCjWqoFaezKlf/0vEDf3NkKOoi1kqvSDGOumCFyQ7+9BOBq8qdqnP3i4e4aJ3vLdVWlRKRaUps08xOP8+j36L9lrO0YAkGXH+bAQXzQMqIFLvjRGX+zOnEikRQ2Zv0JJSlAvftF13BQrTt1SV/ttDa0qo7fzqF1WUijZb/PcKLGUeC30U9kG4JLmrHcuaoe43I08qkGJEofUetZGQLIYhsUqXiVSflGEGfJTWJgMGKAVbGUF5a root@s-252-VM:~# ifconfig eth0 eth0 Link encap:Ethernet HWaddr 0e:00:a9:fe:00:18 inet addr:169.254.0.24 Bcast:169.254.255.255 Mask:255.255.0.0 #KVM Hosts root@kvm704:~/.ssh#<mailto:root@kvm704:%7E/.ssh#> cat /root/.ssh/id_rsa.pub.cloud XiVYy4PpuS4nKoQZRsj+3VQr0dxXFt+s15ZVXiIbaacJ5RsLwlSEO75kbPaL7pDAhyZbcCtC3ICSDUAeJysZ958cTVrl390Lk0UtRoLOUKlNIuOEVzdJpZDmsrm90ZK+OYMaHKf2cVz3xekZv/BS5Kpr2dnieUNZlQlgCXgldwENNZ7+MSlPRLiJ+7/SEBtfn9VTUkXvedhbBtIh3fxexOR/xN+zGRTyotlYANxiypklrlZp6/364JfXoeQviE6aVzsVDboALW+ov6gFj8ziO7vhoUN9M6rJohGnPawjgMCBasVCjWqoFaezKlf/0vEDf3NkKOoi1kqvSDGOumCFyQ7+9BOBq8qdqnP3i4e4aJ3vLdVWlRKRaUps08xOP8+j36L9lrO0YAkGXH+bAQXzQMqIFLvjRGX+zOnEikRQ2Zv0JJSlAvftF13BQrTt1SV/ttDa0qo7fzqF1WUijZb/PcKLGUeC30U9kG4JLmrHcuaoe43I08qkGJEofUetZGQLIYhsUqXiVSflGEGfJTWJgMGKAVbGUF5aroot@kvm704 root@kvm704:~/.ssh#<mailto:root@kvm704:%7E/.ssh#> ssh -p 3922 -q -o StrictHostKeyChecking=no -i /root/.ssh/id_rsa.cloud root@169.254.0.24<mailto:root@169.254.0.24> Enter passphrase for key '/root/.ssh/id_rsa.cloud': # Router which was not redeployed root@r-191-VM:~# cat .ssh/authorized_keys ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDoZcfuh/R2i/n69hFBJEfKKtTQQAs7iOIC9PxFUWrFONk9m4R8Atj8o5wLyHflhe10iMHUJTGH84/J4min1r7zU4xGqzOBR3CKuz6dowmC+IQ38iyJfTOC9L1eIl8AebmKNwSSgjS6nc6sK13iFpgr8mSIr7ExcXPxYgWNgbQAm/HvLVc8Ivf2FZR8f26i3EyjsLVKpHoyvVMeVdoAHBdQB/NiUslAz2fz3cPXxVSxpZobd7iy/uJNVK1CHvqhGYLrXym9cjfM0BOUhN80NN9SvAiyj9D7PHiwcDBzMSSPljxwF6rPY7BXHq+7AVvzVPTJYEuBVQllYn1RrjCS/Fdr cloud@cloudstack If we understand correctly the public and private key are encrypted in the management database. Based on the Source Code master# ag ssh.publickey server/src/com/cloud/server/ConfigurationServerImpl.java 761: "VALUES ('Hidden','DEFAULT', 'management-server','ssh.publickey', '" + DBEncryptionUtil.encrypt(publicKey) + 855: String pubKey = _configDao.getValue("ssh.publickey"); - It looks like the rsa key is still encrypted on the KVM Server and on the System VMs? - Could this issue be related with the keystore cleanup? - We are running out of ideas....arrr regards Martin On 08/31/2016 07:31 PM, Rohit Yadav wrote: Hi Martin, Looks like something went wrong with the CloudStack generated public/private keys. You should not be prompted for any password when trying to ssh to systemvms on port 3922. Can you try to destroy the VRs or create a new network which would create a new VR. Regarding the /bin/bash not found issue, that looks fishy. /bin/bash should be available on both Ubuntu 14.04 based KVM hosts and also on systemvms (which are Debian7 based). Regards. ________________________________ From: martin kolly <martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch> Sent: 29 August 2016 20:10:32 To: dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org> Subject: Re: CS 4.9 NIO Selector wait time PR-1601 We have done more investigations. An error happens when the script /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh is executed: MGMT LOGS: 2016-08-29 16:00:11,342 DEBUG [c.c.a.t.Request] (AgentManager-Handler-12:null) (logid:) Seq 8-1837187172990452475: Processing: { Ans: , MgmtId: 90520741415395, via: 8, Ver: v1, Flags: 10, [{"com.cloud.agent.api.GetRouterAlertsAnswer":{"result":false,"details":"/usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh: 1: /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh: h!/bin/bash: not found","wait":0}}] } 2016-08-29 16:00:11,345 DEBUG [c.c.a.t.Request] (RouterStatusMonitor-1:ctx-9cdc1e33) (logid:5f04bf82) Seq 8-1837187172990452475: Received: { Ans: , MgmtId: 90520741415395, via: 8(kvm702), Ver: v1, Flags: 10, { GetRouterAlertsAnswer } } *2016-08-29 16:00:11,345 DEBUG [c.c.a.m.AgentManagerImpl] (RouterStatusMonitor-1:ctx-9cdc1e33) (logid:5f04bf82) Details from executing class com.cloud.agent.api.routing.GetRouterAlertsCommand: /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh: 1: /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh: h!/bin/bash: not found* 2016-08-29 16:00:11,345 WARN [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:ctx-9cdc1e33) (logid:5f04bf82) Unable to get alerts from router r-221-VM /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh: 1: /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh: h!/bin/bash: not found 2016-08-29 16:00:11,411 INFO [o.a.c.e.o.NetworkOrchestrator] (Network-Scavenger-1:ctx-64537bea) (logid:0c5d8104) NetworkGarbageCollector uses '600' seconds for GC interval. 2016-08-29 16:00:11,416 DEBUG [o.a.c.e.o.NetworkOrchestrator] (Network-Scavenger-1:ctx-64537bea) (logid:0c5d8104) We found network 219 to be free for the first time. Adding it to the list: 1472479211 2016-08-29 16:00:12,392 DEBUG [c.c.s.StatsCollector] (StatsCollector-6:ctx-c1aeb528) (logid:a364cc05) StorageCollector is running... 2016-08-29 16:00:12,403 DEBUG [c.c.h.o.r.Ovm3HypervisorGuru] (StatsCollector-6:ctx-c1aeb528) (logid:a364cc05) getCommandHostDelegation: class com.cloud.agent.api.GetStorageStatsCommand For debugging we modified the "router_proxy.sh" slightly on the KVM Host: kvm# grep -A 2 DEBUG /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh -n 46:# DEBUG CS Upgrade *47-echo "$(date): ssh -p 3922 -q -o StrictHostKeyChecking=no -i $cert root@$domRIp /opt/cloud/bin/$script $*" >> /tmp/router.txt* 48-ssh -p 3922 -q -o StrictHostKeyChecking=no -i $cert root@$domRIp "/opt/cloud/bin/$script $*" kvm# tail /tmp/router.txt Mon Aug 29 16:10:11 CEST 2016: ssh -p 3922 -q -o StrictHostKeyChecking=no -i /root/.ssh/id_rsa.cloud root@169.254.2.67<mailto:root@169.254.2.67> /opt/cloud/bin/netusage.sh -g Mon Aug 29 16:10:11 CEST 2016: ssh -p 3922 -q -o StrictHostKeyChecking=no -i /root/.ssh/id_rsa.cloud root@169.254.2.67<mailto:root@169.254.2.67> /opt/cloud/bin/netusage.sh -g On the router with IP 169.254.2.67 we see the failed logins as well: root@r-221-VM:~# tail -n 2 /var/log/auth.log Aug 29 14:10:11 r-221-VM sshd[7074]: Connection closed by 169.254.0.1 [preauth] Aug 29 14:10:11 r-221-VM sshd[7075]: Connection closed by 169.254.0.1 [preauth] These commands did not succeed. If we execute the command manually we get prompted for the key passphrase. kvm# ssh -p 3922 -o StrictHostKeyChecking=no -i /root/.ssh/id_rsa.cloud root@169.254.2.67<mailto:root@169.254.2.67> Enter passphrase for key '/root/.ssh/id_rsa.cloud': can someone confirm that the rsa key is protected with a passhrase? this looks suspicious to us... regards martin On 08/29/2016 10:09 AM, martin kolly wrote: thanks Simon and Rohit for the valuable inputs! After applying the following procedure the ssl errors are gone and host state is UP. # service cloudstack-management stop mysql> delete from configuration where name = "ssl.keystore" ; # mv /etc/cloudstack/management/cloudmanagementserver.keystore /etc/cloudstack/management/cloudmanagementserver.keystore.old # service cloudstack-management start # file /etc/cloudstack/management/cloudmanagementserver.keystore /etc/cloudstack/management/cloudmanagementserver.keystore: Java KeyStore # file /etc/cloudstack/management/cloudmanagementserver.keystore.old /etc/cloudstack/management/cloudmanagementserver.keystore.old: data #java -version java version "1.7.0_111" However new routers are not deployed, we still see entries that KVM hosts are unreachable (logs underneath). After starting a new router it is shown with "virsh list" on the KVM: kvm# virsh list | grep r-233-VM 111 r-233-VM running After some seconds the router is deleted, kvm# virsh list | grep r-233-VM # kvm#virsh -v 1.2.2 (libvirt was already restarted) ############ Logs/Info KVM ################# kvm # dpkg -l | grep cloudstack ii cloudstack-agent 4.9.0 all CloudStack agent ii cloudstack-common 4.9.0 all A common package which contains files which are shared by several CloudStack packages kvm# df -kh | grep cloud 10.100.12.9:/export/cloud 188G 137G 51G 73% /mnt/5db02c19-1e8f-3591-bdb4-02608362521e kvm# tail -f /var/log/cloudstack/agent/agent.log ... 2016-08-29 06:55:06.203+0000: 4425: error : qemuMonitorFindBalloonObjectPath:1032 : internal error: Cannot determine balloon device path 2016-08-29 06:55:06.223+0000: 4426: error : qemuMonitorFindBalloonObjectPath:1032 : internal error: Cannot determine balloon device path 2016-08-29 06:55:24.062+0000: 4425: warning : qemuDomainObjTaint:1628 : Domain id=112 name='r-233-VM' uuid=d9dcd37a-242d-43ac-a18e-79a4bfa86ebb is tainted: high-privileges 2016-08-29 06:55:49.066+0000: 4423: error : qemuMonitorIO:656 : internal error: End of file from monitor ############ Logs/Info Management Server ######## # dpkg -l | grep cloudstack ii cloudstack-agent 4.9.0 all CloudStack agent ii cloudstack-common 4.9.0 all A common package which contains files which are shared by several CloudStack packages ii cloudstack-management 4.9.0 all CloudStack server library ii cloudstack-usage 4.9.0 all CloudStack usage monitor # mysql -u root cloud -e "select id,name,path from cloud.storage_pool where pool_type='Filesystem'" +----+----------------------+-------------------------+ | id | name | path | +----+----------------------+-------------------------+ | 1 | kvm704 Local Storage | /var/lib/libvirt/images | | 2 | kvm701 Local Storage | /var/lib/libvirt/images | | 3 | kvm702 Local Storage | /var/lib/libvirt/images | | 4 | kvm703 Local Storage | /var/lib/libvirt/images | +----+----------------------+-------------------------+ #tail -f /var/log/cloudstack/management/management-server.log | grep -v DEBUG ... 2016-08-29 08:55:42,484 WARN [o.a.c.alerts] (Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739 ctx-68467c7a) (logid:337b354e) alertType:: 9 // dataCenterId:: 1 // podId:: 1 // clusterId:: null // message:: Command: com.cloud.agent.api.GetDomRVersionCommand failed while starting virtual router 2016-08-29 08:55:42,494 ERROR [c.c.n.r.VirtualNetworkApplianceManagerImpl] (Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739 ctx-68467c7a) (logid:337b354e) GetDomRVersionCmd failed 2016-08-29 08:55:42,495 WARN [c.c.n.r.VirtualNetworkApplianceManagerImpl] (Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739 ctx-68467c7a) (logid:337b354e) Command: com.cloud.agent.api.GetDomRVersionCommand failed while starting virtual router 2016-08-29 08:55:42,495 INFO [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739 ctx-68467c7a) (logid:337b354e) The guru did not like the answers so stopping VM[DomainRouter|r-233-VM] 2016-08-29 08:55:49,184 INFO [o.a.c.f.j.i.AsyncJobManagerImpl] (AsyncJobMgr-Heartbeat-1:ctx-5bd7c179) (logid:86c0376e) Begin cleanup expired async-jobs 2016-08-29 08:55:49,192 INFO [o.a.c.f.j.i.AsyncJobManagerImpl] (AsyncJobMgr-Heartbeat-1:ctx-5bd7c179) (logid:86c0376e) End cleanup expired async-jobs 2016-08-29 08:55:50,861 ERROR [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739 ctx-68467c7a) (logid:337b354e) Failed to start instance VM[DomainRouter|r-233-VM] com.cloud.utils.exception.ExecutionException: Unable to start VM:d9dcd37a-242d-43ac-a18e-79a4bfa86ebb due to error in finalizeStart, not retrying at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1084) at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4592) at sun.reflect.GeneratedMethodAccessor290.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:107) at com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachineManagerImpl.java:4753) at com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:102) at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:554) at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53) at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46) at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:502) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2016-08-29 08:55:51,136 ERROR [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739 ctx-68467c7a) (logid:337b354e) Invocation exception, caused by: com.cloud.exception.AgentUnavailableException: Resource [Host:8] is unreachable: Host 8: Unable to start instance due to Unable to start VM:d9dcd37a-242d-43ac-a18e-79a4bfa86ebb due to error in finalizeStart, not retrying 2016-08-29 08:55:51,136 INFO [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739 ctx-68467c7a) (logid:337b354e) Rethrow exception com.cloud.exception.AgentUnavailableException: Resource [Host:8] is unreachable: Host 8: Unable to start instance due to Unable to start VM:d9dcd37a-242d-43ac-a18e-79a4bfa86ebb due to error in finalizeStart, not retrying 2016-08-29 08:55:51,137 ERROR [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739) (logid:337b354e) Unable to complete AsyncJobVO {id:2739, userId: 2, accountId: 2, instanceType: null, instanceId: null, cmd: com.cloud.vm.VmWorkStart, cmdInfo: rO0ABXNyABhjb20uY2xvdWQudm0uVm1Xb3JrU3RhcnR9cMGsvxz73gIAC0oABGRjSWRMAAZhdm9pZHN0ADBMY29tL2Nsb3VkL2RlcGxveS9EZXBsb3ltZW50UGxhbm5lciRFeGNsdWRlTGlzdDtMAAljbHVzdGVySWR0ABBMamF2YS9sYW5nL0xvbmc7TAAGaG9zdElkcQB-AAJMAAtqb3VybmFsTmFtZXQAEkxqYXZhL2xhbmcvU3RyaW5nO0wAEXBoeXNpY2FsTmV0d29ya0lkcQB-AAJMAAdwbGFubmVycQB-AANMAAVwb2RJZHEAfgACTAAGcG9vbElkcQB-AAJMAAlyYXdQYXJhbXN0AA9MamF2YS91dGlsL01hcDtMAA1yZXNlcnZhdGlvbklkcQB-AAN4cgATY29tLmNsb3VkLnZtLlZtV29ya5-ZtlbwJWdrAgAESgAJYWNjb3VudElkSgAGdXNlcklkSgAEdm1JZEwAC2hhbmRsZXJOYW1lcQB-AAN4cAAAAAAAAAACAAAAAAAAAAIAAAAAAAAA6XQAGVZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGwAAAAAAAAAAHBwcHBwcHBwc3IAEWphdmEudXRpbC5IYXNoTWFwBQfawcMWYNEDAAJGAApsb2FkRmFjdG9ySQAJdGhyZXNob2xkeHA_QAAAAAAADHcIAAAAEAAAAAF0AA5SZXN0YXJ0TmV0d29ya3QAP3JPMEFCWE55QUJGcVlYWmhMbXhoYm1jdVFtOXZiR1ZoYnMwZ2NvRFZuUHJ1QWdBQldnQUZkbUZzZFdWNGNBRXhw, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 90520741415395, completeMsid: null, lastUpdated: null, lastPolled: null, created: Mon Aug 29 08:55:21 CEST 2016}, job origin:2738 com.cloud.exception.AgentUnavailableException: Resource [Host:8] is unreachable: Host 8: Unable to start instance due to Unable to start VM:d9dcd37a-242d-43ac-a18e-79a4bfa86ebb due to error in finalizeStart, not retrying at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1120) at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4592) at sun.reflect.GeneratedMethodAccessor290.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:107) at com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachineManagerImpl.java:4753) at com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:102) at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:554) at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53) at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46) at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:502) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: com.cloud.utils.exception.ExecutionException: Unable to start VM:d9dcd37a-242d-43ac-a18e-79a4bfa86ebb due to error in finalizeStart, not retrying at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1084) at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4592) at sun.reflect.GeneratedMethodAccessor290.invoke(Unknown Source) ... 17 more 2016-08-29 08:55:51,168 INFO [o.a.c.f.j.i.AsyncJobMonitor] (Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739) (logid:337b354e) Remove job-2739 from job monitoring 2016-08-29 08:55:53,197 INFO [o.a.c.f.j.i.AsyncJobMonitor] (Work-Job-Executor-113:ctx-6411b64b job-2738/job-2740) (logid:db064d99) Add job-2740 into job monitoring 2016-08-29 08:55:53,218 INFO [o.a.c.f.j.i.AsyncJobMonitor] (Work-Job-Executor-113:ctx-6411b64b job-2738/job-2740) (logid:337b354e) Remove job-2740 from job monitoring 2016-08-29 08:55:53,247 ERROR [c.c.a.ApiAsyncJobDispatcher] (API-Job-Executor-68:ctx-d2eb7557 job-2738) (logid:337b354e) Unexpected exception while executing org.apache.cloudstack.api.command.admin.router.StartRouterCmd com.cloud.exception.AgentUnavailableException: Resource [Host:8] is unreachable: Host 8: Unable to start instance due to Unable to start VM:d9dcd37a-242d-43ac-a18e-79a4bfa86ebb due to error in finalizeStart, not retrying at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1120) at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4592) at sun.reflect.GeneratedMethodAccessor290.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:107) at com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachineManagerImpl.java:4753) at com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:102) at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:554) at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53) at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46) at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:502) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: com.cloud.utils.exception.ExecutionException: Unable to start VM:d9dcd37a-242d-43ac-a18e-79a4bfa86ebb due to error in finalizeStart, not retrying at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1084) at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4592) at sun.reflect.GeneratedMethodAccessor290.invoke(Unknown Source) ... 17 more 2016-08-29 08:55:53,256 INFO [o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-68:ctx-d2eb7557 job-2738) (logid:337b354e) Remove job-2738 from job monitoring regards Martin On 08/26/2016 08:47 PM, Rohit Yadav wrote: Hi Martin, I checked my openssl connect output and my management server on port 8250 was indeed returning valid certificates. On further investigation, when connections to management server (NioServer, handled by Link class that reads /etc/cloudstack/management/cloudmanagementserver.keystore) are received it uses a keystore file (/etc/cloudstack/management/cloudmanagementserver.keystore). Please check if that's valid, or backup your keystore file and do what Simon has suggested. For reference, here the output from my 4.9.0 mgmt server running on Ubuntu 14.04.4: $ openssl s_client -tls1 -connect 192.168.1.11:8250 CONNECTED(00000003) depth=0 C = Unknown, O = bluebox, OU = bluebox, CN = Cloudstack User verify error:num=18:self signed certificate verify return:1 depth=0 C = Unknown, O = bluebox, OU = bluebox, CN = Cloudstack User verify return:1 --- Certificate chain 0 s:/C=Unknown/O=bluebox/OU=bluebox/CN=Cloudstack User i:/C=Unknown/O=bluebox/OU=bluebox/CN=Cloudstack User --- Server certificate -----BEGIN CERTIFICATE----- MIIDPzCCAiegAwIBAgIEFHHX2DANBgkqhkiG9w0BAQsFADBQMRAwDgYDVQQGEwdV bmtub3duMRAwDgYDVQQKEwdibHVlYm94MRAwDgYDVQQLEwdibHVlYm94MRgwFgYD VQQDEw9DbG91ZHN0YWNrIFVzZXIwHhcNMTUwNTA2MTM0MjMzWhcNMjUwNTAzMTM0 MjMzWjBQMRAwDgYDVQQGEwdVbmtub3duMRAwDgYDVQQKEwdibHVlYm94MRAwDgYD VQQLEwdibHVlYm94MRgwFgYDVQQDEw9DbG91ZHN0YWNrIFVzZXIwggEiMA0GCSqG SIb3DQEBAQUAA4IBDwAwggEKAoIBAQC0wN8kfOJMzwlbrOnBj/jjvjjIwDVpYLtH WoKkNB+rzzKEUxaYwaQxe6E3M536ZuqcaJBIqcYPwTIkWyulvuHuJpSQak4VbuDV f7dqt5RacLFT0jUciqTvL5QDCrk0uNugKkWgEvNtokVGSBwLPVEcdcGWpku1EpeH vMYmpOkcWgbC8Z9D7QTlVw6oEWbPAtKr+gDrXdOFnpPPI45rteatIIgKm1Q6JjZM qrUKfqt7s8ts6ZgdAN2WmtieSsnUX1su9SJMYg2J8LK7UJeGqiNtE+g944GPqtnW aDldjharq54e79ug2ktxw29I3ulpRD/vgxwZmcPJrePUKwY91KEjAgMBAAGjITAf MB0GA1UdDgQWBBSUJvLY8RL/1fTVqj1rT8136da4yzANBgkqhkiG9w0BAQsFAAOC AQEAgDdFIgLvOH/UgRp2nnFUVcMp+uchSLj8CbCkukJBrUwrmJHp3Os+H1ggk8Vt j3conj06zJBNN/E0J8pcpagE1aR+l4R8WxF3g/Oc7bNyrUlkGSQ82vavg9sEkwHY eQY/4wj8CprICs9JilgZ6keeWNWgAW1goLZSzGVwz5eE0lPuc2Dg3laR5RsuTxie dgQhpbOx3UZun+dhuP5NUHc+KWyrNvSZNN8FruO602KWZwm0Hndl7RVbkNEd0kxq FhFK4Scc2HBrKMUrPTzO1nGCgR1gA015C2MFfmjeW49VTi95WnY8DDG2euUYAtpl lLVuNxJxq7eDJfP/M9kxSIKgrQ== -----END CERTIFICATE----- subject=/C=Unknown/O=bluebox/OU=bluebox/CN=Cloudstack User issuer=/C=Unknown/O=bluebox/OU=bluebox/CN=Cloudstack User --- No client certificate CA names sent Server Temp Key: ECDH, P-256, 256 bits --- SSL handshake has read 1325 bytes and written 331 bytes --- New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES256-SHA Server public key is 2048 bit Secure Renegotiation IS supported Compression: NONE Expansion: NONE No ALPN negotiated SSL-Session: Protocol : TLSv1 Cipher : ECDHE-RSA-AES256-SHA Session-ID: 57C08E5D841D9B74AB533A0000BDBE4275424640B5B7D1985D438B275598436F Session-ID-ctx: Master-Key: E1DE7D7464766DCECD82088561A7D928519CA20BA91FDF4AA38BB2E51AB7C27D3F1ABA5788ADED2AD49FE8597EC4A344 Key-Arg : None PSK identity: None PSK identity hint: None SRP username: None Start Time: 1472237149 Timeout : 7200 (sec) Verify return code: 18 (self signed certificate) --- Regards. ________________________________ From: martin kolly <martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch> Sent: 26 August 2016 20:36:02 To: dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org> Subject: Re: CS 4.9 NIO Selector wait time PR-1601 good point, thanks Simon! with openssl we receive a response on port 8250. # telnet 10.100.12.10 8250 Trying 10.100.12.10... Connected to 10.100.12.10. Escape character is '^]'. Connection closed by foreign host. # nc -zv 10.100.12.10 8250 Connection to 10.100.12.10 8250 port [tcp/*] succeeded! # openssl s_client -tls1 -connect 10.100.12.10:8250 CONNECTED(00000003) write:errno=104 --- no peer certificate available --- No client certificate CA names sent --- SSL handshake has read 0 bytes and written 0 bytes --- New, (NONE), Cipher is (NONE) Secure Renegotiation IS NOT supported Compression: NONE Expansion: NONE SSL-Session: Protocol : TLSv1 Cipher : 0000 Session-ID: Session-ID-ctx: Master-Key: Key-Arg : None PSK identity: None PSK identity hint: None SRP username: None Start Time: 1472223447 Timeout : 7200 (sec) Verify return code: 0 (ok) --- On 08/26/2016 04:49 PM, Simon Weller wrote: Martin, Are you able to actually telnet to 8250 from the host to the mgmt server? - Si ________________________________ From: martin kolly <martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch> Sent: Friday, August 26, 2016 9:41 AM To: dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org> Subject: Re: CS 4.9 NIO Selector wait time PR-1601 Hi Rohit We highly appreciate your efforts! Unfortunately it still does not work. - ulimit is increased on mgmt server - jar file replaced - we confirm that cloudstack-agent 4.9.0 is installed MGMT Server # wget https://github.com/rhtyd/cloudstack/releases/download/4.9.0-nioinbound/cloud-utils-4.9.0.jar -O cloud-utils-4.9.0.jar.patch # md5sum cloud-utils-4.9.0.jar.patch c4496f42cc6741f562ac645c3a3d8a0c cloud-utils-4.9.0.jar.patch # md5sum /usr/share/cloudstack-management/webapps/client/WEB-INF/lib/cloud-utils-4.9.0.jar c4496f42cc6741f562ac645c3a3d8a0c /usr/share/cloudstack-management/webapps/client/WEB-INF/lib/cloud-utils-4.9.0.jar # ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 64109 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 10240 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 64109 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited KVM Server # wget https://github.com/rhtyd/cloudstack/releases/download/4.9.0-nioinbound/cloud-utils-4.9.0.jar -O cloud-utils-4.9.0.jar.patch # md5sum cloud-utils-4.9.0.jar.patch c4496f42cc6741f562ac645c3a3d8a0c cloud-utils-4.9.0.jar.patch # md5sum /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar c4496f42cc6741f562ac645c3a3d8a0c /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar # apt-cache policy cloudstack-agent cloudstack-agent: Installed: 4.9.0 Candidate: 4.9.0 Version table: *** 4.9.0 0 500 http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/ ./ Packages 100 /var/lib/dpkg/status The logs are attached. By the way: the error message "Caught the Exception in VmIpFetchTask" was already there with 4.8 release. Thanks Martin On 08/25/2016 07:44 PM, Rohit Yadav wrote: Hi Martin, Thanks for sharing. Alright, I'm not sure what's causing issue but based on the logs seems like only KVM agents are having issues while connecting to mgmt server as I don't see any Nio related exceptions in the management server logs. I could not see the cloudstack-agent version in the logs, I'm assuming that they were all upgraded to 4.9.0, and there are no conflicting jars at /usr/share/cloudstack-agent/lib. First, can you make sure mgmt server has enough ulimit. I found that Ubuntu/Debian's init.d script don't override this while CentOS initd/systemd script sets ulimit. On your mgmt server, edit /etc/init.d/cloudstack-management and add ulimit -n 10240 just before the mgmt server is started in the 'state' section (for me it was at around line #147 where it logs a message that it's starting the cloudstack-management server). Next, if this still does not solve the issue -- I created a special cloud-utils.jar for you that you need to place on your mgmt server and on the KVM agents and restart the mgmt server. This will increase verbosity of the error while reduce the Nio polling loop timeout (from 100ms to 10ms). On KVM agents, the error from the logs is that during SSL handshake inbound connection/stream gets closed, and we want to know the exception message. Please get the jar from here: https://github.com/rhtyd/cloudstack/releases/tag/4.9.0-nioinbound and place them at: /usr/share/cloudstack-agent/lib/ (on kvm host) /usr/share/cloudstack-management/webapps/client/WEB-INF/lib/ (on mgmt server host) Let me know what worked for you, and if it still failed can you share the mgmt server and agent logs once again. Thanks. Regards. ________________________________ From: martin kolly <martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch> Sent: 25 August 2016 20:50:08 To: dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org> Subject: Re: CS 4.9 NIO Selector wait time PR-1601 Hi Rohit We are running java version 1.7.0.111 on KVM and management server. mgmt# java -version java version "1.7.0_111" kvm# java -version java version "1.7.0_111" We get the same error message. Attached are the logs with TRACE enabled. "success consists of going from failure to failure without loss of enthusiasm." regards martin On 08/25/2016 02:02 PM, Rohit Yadav wrote: Hi Martin, Thanks for sharing, on the surface there does not seem to be any issue in configuration causing the failures. I'm personally running KVM and Ubuntu hosts based env without issues, I'm on Ubuntu 14.04.4 (Linux bluebox 3.16.0-45-generic #60~14.04.1-Ubuntu) and java 1.7.0_79. Can you try upgrading your JRE7 to latest (openjdk-7-jre, 7u111-2.6.7-0ubuntu0.14.04.3) on all mgmt server and kvm hosts? If upgrading your JRE does not help, can you increase the logging verbosity for both the agent and management server (in /etc/cloudstack/{agent, management} there would be a log4j file, edit that and replace DEBUG/INFO with TRACE for class/keys com.cloud and org.apache.cloudstack) and re-share logs when the failures occur? I want to see what additional information we can get from logs when it tries to connect to host 10.100.12.10 on port: 8250. Regards. ________________________________ From: martin kolly <martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch> Sent: 25 August 2016 17:11:06 To: dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org> Subject: Re: CS 4.9 NIO Selector wait time PR-1601 @Simon: We have one management server with local DB. KVMs connect directly to the management server without any security/loadbalancing device. Thanks Martin On 08/25/2016 12:41 PM, Simon Weller wrote: Martin, Can you provide more detail about your haproxy setup? Are you running it on separate servers, or on the management server itself? - Si Simon Weller/ENA (615) 312-6068 rohit.ya...@shapeblue.com<mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com> www.shapeblue.com<http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com> 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue rohit.ya...@shapeblue.comĀ www.shapeblue.com 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue -----Original Message----- From: martin kolly [martin.ko...@senselan.ch<mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch>] Received: Thursday, 25 Aug 2016, 5:04AM To: Rohit Yadav [rohit.ya...@shapeblue.com<mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com>]; dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org> [dev@cloudstack.apache.org<mailto:dev@cloudstack.a%0Apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org>] Subject: Re: CS 4.9 NIO Selector wait time PR-1601 thanks for your reply. This morning we repeated the upgrade process from 4.8 to 4.9 with the following repository: http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/. <http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/><http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/><http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/><http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/><http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/><http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/><http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/><http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/> Unfortunately we run into the same issue: /2016-08-25 09:49:00,660 INFO [utils.nio.NioClient] (main:null) (logid:) Connecting to 10.100.12.10:8250// //2016-08-25 09:49:00,668 WARN [utils.nio.Link] (main:null) (logid:) This SSL engine was forced to close inbound due to end of stream.// //2016-08-25 09:49:00,668 ERROR [utils.nio.NioClient] (main:null) (logid:) SSL Handshake failed while connecting to host: 10.100.12.10 port: 8250// //2016-08-25 09:49:00,668 ERROR [utils.nio.NioConnection] (main:null) (logid:) Unable to initialize the threads.// //java.io.IOException: SSL Handshake failed while connecting to host: 10.100.12.10 port: 8250// // at com.cloud.utils.nio.NioClient.init(NioClient.java:67)// // at com.cloud.utils.nio.NioConnection.start(NioConnection.java:88)// // at com.cloud.agent.Agent.start(Agent.java:237)// // at com.cloud.agent.AgentShell.launchAgent(AgentShell.java:399)// // at com.cloud.agent.AgentShell.launchAgentFromClassInfo(AgentShell.java:367)// // at com.cloud.agent.AgentShell.launchAgent(AgentShell.java:351)// // at com.cloud.agent.AgentShell.start(AgentShell.java:456)// // at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)// // at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)// // at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)// // at java.lang.reflect.Method.invoke(Method.java:606)// // at org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:243)// //2016-08-25 09:49:00,669 INFO [utils.exception.CSExceptionErrorCode] (main:null) (logid:) Could not find exception: com.cloud.utils.exception.NioConnectionException in error code list for exceptions// //2016-08-25 09:49:00,669 WARN [cloud.agent.Agent] (main:null) (logid:) NIO Connection Exception com.cloud.utils.exception.NioConnectionException: SSL Handshake failed while connecting to host: 10.100.12.10 port: 8250// //2016-08-25 09:49:00,670 INFO [cloud.agent.Agent] (main:null) (logid:) Attempted to connect to the server, but received an unexpected exception, trying again.../ *KVM Hosts: */# java -version java version "1.7.0_95" OpenJDK Runtime Environment (IcedTea 2.6.4) (7u95-2.6.4-0ubuntu0.14.04.1) OpenJDK 64-Bit Server VM (build 24.95-b01, mixed mode) # dpkg --get-selections | grep -e 'jdk' -e 'java' ca-certificates-java install java-common install libcommons-daemon-java install openjdk-7-jre-headless:amd64 install tzdata-java install # apt-cache policy cloudstack-agent cloudstack-agent: Installed: 4.9.0 Candidate: 4.9.0 Version table: *** 4.9.0 0 500 http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/ ./ Packages 100 /var/lib/dpkg/status # find /usr/share/ -name "cloud-utils*.jar" /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar # md5sum /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar a8de7306d7c80b5a73e93b83afdd119f /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar /*Management Server: */# java -version// //java version "1.7.0_95"// //OpenJDK Runtime Environment (IcedTea 2.6.4) (7u95-2.6.4-0ubuntu0.14.04.1)// //OpenJDK 64-Bit Server VM (build 24.95-b01, mixed mode)// // //# dpkg --get-selections | grep -e 'jdk' -e 'java'// //ca-certificates-java install// //java-common install// //libcommons-collections3-java install// //libcommons-daemon-java install// //libcommons-dbcp-java install// //libcommons-pool-java install// //libecj-java install// //libgeronimo-jta-1.1-spec-java install// //libmysql-java install// //libservlet2.5-java install// //libtomcat6-java install// //openjdk-7-jre-headless:amd64 install// //tzdata-java install// // //# apt-cache policy cloudstack-management// //cloudstack-management:// // Installed: 4.9.0// // Candidate: 4.9.0// // Version table:// // *** 4.9.0 0// // 500 http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/ ./ Packages// // 100 /var/lib/dpkg/status/// /# find /usr/share/ -name "cloud-utils*.jar"// ///usr/share/cloudstack-management/webapps/client/WEB-INF/lib/cloud-utils-4.9.0.jar// ///usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar// ///usr/share/cloudstack-usage/lib/cloud-utils-4.9.0.jar// //# md5sum /usr/share/cloudstack-management/webapps/client/WEB-INF/lib/cloud-utils-4.9.0.jar// //a8de7306d7c80b5a73e93b83afdd119f /usr/share/cloudstack-management/webapps/client/WEB-INF/lib/cloud-utils-4.9.0.jar// //# md5sum /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar// //a8de7306d7c80b5a73e93b83afdd119f /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar// //# md5sum /usr/share/cloudstack-usage/lib/cloud-utils-4.9.0.jar// //a8de7306d7c80b5a73e93b83afdd119f /usr/share/cloudstack-usage/lib/cloud-utils-4.9.0.jar/ The classpath.conf was not modified: /# cat /etc/cloudstack/management/classpath.conf #!/bin/bash #... SYSTEMJARS="" SCP=$(build-classpath $SYSTEMJARS 2>/dev/null) ; if [ $? != 0 ] ; then export SCP="" ; fi MCP="" DCP="/usr/share/tomcat6/bin/bootstrap.jar:/usr/share/tomcat6/bin/tomcat-juli.jar" CLASSPATH=$SCP:$DCP:$MCP:/etc/cloudstack/management:/usr/share/cloudstack-management/setup for jarfile in ""/* ; do if [ ! -e "$jarfile" ] ; then continue ; fi CLASSPATH=$jarfile:$CLASSPATH done for plugin in ""/* ; do if [ ! -e "$plugin" ] ; then continue ; fi CLASSPATH=$plugin:$CLASSPATH done for vendorconf in "/etc/cloudstack/management"/vendor/* ; do if [ ! -d "$vendorconf" ] ; then continue ; fi CLASSPATH=$vendorconf:$CLASSPATH done export CLASSPATH if ([ -z "$JAVA_HOME" ] || [ ! -d "$JAVA_HOME" ]) && [ -d /usr/lib/jvm/jre-1.7.0 ]; then export JAVA_HOME=/usr/lib/jvm/jre-1.7.0 fi PATH=$JAVA_HOME/bin:/sbin:/usr/sbin:$PATH export PATH/ Regards Martin On 08/24/2016 06:56 PM, Rohit Yadav wrote: Martin, Were you able to fix your issue after installing packages from the repo Will shared and restarting the services? I've not personally tested the apt-get.eu repo, but I had earlier built this repo which I'm personally using in my local KVM-trusty based cloud: http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/ If you're still getting the error, can you share the JRE version you're running, both on the mgmt server and on the KVM hosts? You can run java -version, or share output of "dpkg --get-selections | grep -e 'jdk' -e 'java'". Are you running CloudStack with any additional plugins? >From the logs, looks like there are mixed jar files, NioConnectionException class was not found -- something's wrong with your installation. there must be a cloud-utils jar file make sure your installation don't have multiple copies/versions of jars (somewhere) in the in /usr/share/cloudstack-common and in /usr/share/cloudstack-management/webapps/client/ paths: Could not find exception: com.cloud.utils.exception.NioConnectionException in error code list for exceptions The error "Unable to initialize the threads." suggests, JVM was not able to spawn threads. I would like to know your JRE version and any other settings configured in /etc/cloudstack/management/classpath.conf (and there are bunch of other files where JAVA_OPTS might have been overridden). Note: For now you should only be using JRE1.7. Regards. rohit.ya...@shapeblue.com<mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com> www.shapeblue.com<http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com> @shapeblue ------------------------------------------------------------------------ *From:* martin kolly <martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch> *Sent:* 24 August 2016 19:53:26 *To:* dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org>; Rohit Yadav *Subject:* Re: CS 4.9 NIO Selector wait time PR-1601 Thanks Will! yes the repo is pointing to 4.9 release for all KVMs and for the management server: /cloudstack:~# cat /etc/apt/sources.list.d/cloudstack.list // //deb http://cloudstack.apt-get.eu/ubuntu trusty 4.9/ All KVM agents and the mgmt server are upgraded to release 4.9 based on the documentation.We have restarted all the cloudstack-agents and the cloudstack-management service as well. Network traces are showing packets from KVM <-> Mgmt on port 8250. there is no security device in between. thanks fanfarlo On 08/24/2016 04:13 PM, Will Stevens wrote: @rohit, I am guessing they should be installing the cloudstack-agent using the following repo right? That is what is described in the upgrade (trusty instead of precise though). http://cloudstack.apt-get.eu/ubuntu/dists/trusty/4.9/ @fanfarlo, are your repo's setup to point to the new 4.9 version? cheers, will On Wed, Aug 24, 2016 at 9:46 AM, Rohit Yadav <rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com> wrote: The PR and fix already exists in 4.9.0 release. Please make sure to upgrade all of your management server(s) and KVM agents and then also restart them after the upgrade. If you are seeing SSL handshake failures, it could be due to network or security issue and most likely due to mismatch between CloudStack mgmt server and KVM agent version. Regards. rohit.ya...@shapeblue.com<mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com> www.shapeblue.com<http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com> @shapeblue ------------------------------ *From:* Will Stevens <williamstev...@gmail.com><mailto:williamstev...@gmail.com><mailto:williamstev...@gmail.com><mailto:williamstev...@gmail.com><mailto:williamstev...@gmail.com><mailto:williamstev...@gmail.com><mailto:williamstev...@gmail.com><mailto:williamstev...@gmail.com> *Sent:* 24 August 2016 18:17:17 *To:* dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org>; Rohit Yadav *Subject:* Re: CS 4.9 NIO Selector wait time PR-1601 That PR is already merged, so you don't have to do anything to get that code, you already have it. @rohit, can you review this? I think this is a similar to the issue Simon reported earlier. Will On Aug 24, 2016 6:56 AM, "fanfarlo" <fanfar...@gmail.com><mailto:fanfar...@gmail.com><mailto:fanfar...@gmail.com><mailto:fanfar...@gmail.com><mailto:fanfar...@gmail.com><mailto:fanfar...@gmail.com><mailto:fanfar...@gmail.com><mailto:fanfar...@gmail.com> wrote: hi all We have the following environment: - OS: Debian 14.04 (hypervisors and management) - 4 KVM Hosts - Cloudstack Release 4.9 with local database Since we upgraded to Release 4.9 the KVM hosts no longer connect to the management Server. Upgrade procedure was followed as described: http://docs.cloudstack.apache.org/projects/cloudstack-releas e-notes/en/4.9.0/upgrade/upgrade-4.8.html On the KVM hosts we have the following error message: /2016-08-24 10:42:49,678 INFO [utils.exception.CSExceptionErrorCode] (main:null) (logid:) Could not find exception: com.cloud.utils.exception.NioConnectionException in error code list for exceptions 2016-08-24 10:42:49,678 WARN [cloud.agent.Agent] (main:null) (logid:) NIO Connection Exception com.cloud.utils.exception.NioConnectionException: SSL Handshake failed while connecting to host: 10.100.12.10 port: 8250 2016-08-24 10:42:49,678 INFO [cloud.agent.Agent] (main:null) (logid:) Attempted to connect to the server, but received an unexpected exception, trying again... 2016-08-24 10:42:54,679 INFO [utils.nio.NioClient] (main:null) (logid:) Connecting to 10.100.12.10:8250 2016-08-24 10:42:54,684 WARN [utils.nio.Link] (main:null) (logid:) This SSL engine was forced to close inbound due to end of stream. 2016-08-24 10:42:54,684 ERROR [utils.nio.NioClient] (main:null) (logid:) SSL Handshake failed while connecting to host: 10.100.12.10 port: 8250 2016-08-24 10:42:54,685 ERROR [utils.nio.NioConnection] (main:null) (logid:) Unable to initialize the threads. java.io.IOException: SSL Handshake failed while connecting to host: 10.100.12.10 port: 8250 at com.cloud.utils.nio.NioClient.init(NioClient.java:67) at com.cloud.utils.nio.NioConnection.start(NioConnection.java:88) at com.cloud.agent.Agent.start(Agent.java:237) at com.cloud.agent.AgentShell.launchAgent(AgentShell.java:399) at com.cloud.agent.AgentShell.launchAgentFromClassInfo(AgentShell.java:367) at com.cloud.agent.AgentShell.launchAgent(AgentShell.java:351) at com.cloud.agent.AgentShell.start(AgentShell.java:456) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce ssorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe thodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.commons.daemon.support.DaemonLoader.start(DaemonL oader.java:243) 2016-08-24 10:42:54,685 INFO [utils.exception.CSExceptionErrorCode] (main:null) (logid:) Could not find exception: com.cloud.utils.exception.NioConnectionException in error code list for exceptions 2016-08-24 10:42:54,685 WARN [cloud.agent.Agent] (main:null) (logid:) NIO Connection Exception com.cloud.utils.exception.NioConnectionException: SSL Handshake failed while connecting to host: 10.100.12.10 port: 8250 2016-08-24 10:42:54,686 INFO [cloud.agent.Agent] (main:null) (logid:) Attempted to connect to the server, but received an unexpected exception, trying again.../ Port is open on the management server, there is no firewall in between. We found that there was a bug report here: https://issues.apache.org/jira/browse/CLOUDSTACK-9348. There is a PR changing the NIO Selector wait time: https://github.com/apache/cloudstack/pull/1601 which was merged into master branch. Since we installed Release 4.9 we probably need to patch the NioConection.class as described in PR1601 , right? kvm03# unzip -v /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar | grep NioConnection 3923 Defl:N 1778 55% 2016-08-02 09:28 05aaf7d5 com/cloud/utils/nio/NioConnection$1.class 881 Defl:N 495 44% 2016-08-02 09:28 e378984c com/cloud/utils/nio/NioConnection$ChangeRequest.class 15410 Defl:N 7130 54% 2016-08-02 09:28 b3281f5a com/cloud/utils/nio/NioConnection.class 1134 Defl:N 584 49% 2016-08-02 09:28 8d5cb4a8 com/cloud/utils/exception/NioConnectionException.class Due to a lack of java expertise we have some basic questions: - Is there a patched jar file available ? public build server? - Do we need to create the jar from sources ? procedure? - How do we apply the patch ? many thanks! fanfarlo rohit.ya...@shapeblue.com<mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com> www.shapeblue.com<http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com> 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue rohit.ya...@shapeblue.com<mailto:rohit.ya...@shapeblue.com> www.shapeblue.com<http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com> 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue rohit.ya...@shapeblue.com<mailto:rohit.ya...@shapeblue.com> www.shapeblue.com<http://www.shapeblue.com> 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue