Thanks for sharing Martin, I'm glad it worked for you.

Regards.
________________________________
From: martin kolly <martin.ko...@senselan.ch>
Sent: 13 September 2016 00:52:49
To: dev@cloudstack.apache.org; Rohit Yadav
Subject: Re: CS 4.9 NIO Selector wait time PR-1601


We could fix the issue by generating new pairs of ssh keys. Here the procedure 
we applied:

1) Stop Management Server

2) Delete SSH Keys in mysql Database:
delete from configuration  where name = "ssh.publickey" ;
delete from configuration  where name = "ssh.privatekey" ;

3) Delete the SSH Keys
rm /var/lib/cloudstack/management/.ssh/id_rsa.pub
rm /var/lib/cloudstack/management/.ssh/id_rsa

4) Start the Management Server
- SSH Keys are generated and mysql entries inserted

# grep  ssh-keygen /var/log/cloudstack/management/management-server.log
2016-09-12 20:47:40,000 DEBUG [c.c.u.s.Script] (main:null) (logid:) Executing: 
/bin/bash -c if [ -f /var/lib/cloudstack/management/.ssh/id_rsa ]; then rm -f 
/var/lib/cloudstack/management/.ssh/id_rsa; fi; ssh-keygen -t rsa -N '' -f 
/var/lib/cloudstack/management/.ssh/id_rsa -q
2016-09-12 20:54:23,835 DEBUG [c.c.u.s.Script] (main:null) (logid:) Executing: 
/bin/bash -c if [ -f /var/lib/cloudstack/management/.ssh/id_rsa ]; then rm -f 
/var/lib/cloudstack/management/.ssh/id_rsa; fi; ssh-keygen -t rsa -N '' -f 
/var/lib/cloudstack/management/.ssh/id_rsa -q

Now works like a charm!

thanks and regards
martin


On 09/12/2016 04:22 PM, martin kolly wrote:
Hi Rohit

Finally we had some time to dig deeper into this issue. The /bin/bash issue was 
a typo on our side, sorry!

System VM was destroyed/recreated several times.

But something with the ssh keys is still wrong. The KVM is still asking for the 
pass phrase when ssh into system vms.

# Management Server
mysql> select * from configuration where name = "ssh.publickey" \G;
*************************** 1. row ***************************
     category: Hidden
     instance: DEFAULT
    component: management-server
         name: ssh.publickey
        value: 
XiVYy4PpuS4nKoQZRsj+3VQr0dxXFt+s15ZVXiIbaacJ5RsLwlSEO75kbPaL7pDAhyZbcCtC3ICSDUAeJysZ958cTVrl390Lk0UtRoLOUKlNIuOEVzdJpZDmsrm90ZK+OYMaHKf2cVz3xekZv/BS5Kpr2dnieUNZlQlgCXgldwENNZ7+MSlPRLiJ+7/SEBtfn9VTUkXvedhbBtIh3fxexOR/xN+zGRTyotlYANxiypklrlZp6/364JfXoeQviE6aVzsVDboALW+ov6gFj8ziO7vhoUN9M6rJohGnPawjgMCBasVCjWqoFaezKlf/0vEDf3NkKOoi1kqvSDGOumCFyQ7+9BOBq8qdqnP3i4e4aJ3vLdVWlRKRaUps08xOP8+j36L9lrO0YAkGXH+bAQXzQMqIFLvjRGX+zOnEikRQ2Zv0JJSlAvftF13BQrTt1SV/ttDa0qo7fzqF1WUijZb/PcKLGUeC30U9kG4JLmrHcuaoe43I08qkGJEofUetZGQLIYhsUqXiVSflGEGfJTWJgMGKAVbGUF5a
  description: Public key for the entire CloudStack
default_value: NULL
      updated: NULL
        scope: NULL
   is_dynamic: 0
1 row in set (0.00 sec)

mysql> select * from configuration where name = "ssh.privatekey" \G;
*************************** 1. row ***************************
     category: Hidden
     instance: DEFAULT
    component: management-server
         name: ssh.privatekey
        value: 
j+Z5ckPXUq7lwo2L4OYaaiDOA/fdUpadX3h7E/DA1rwPRSz967eckCbRRpCtCWxfQjtAgPNfLgGIQHX4K7o9yF9DBXG0ESXoECyrRAOTmfZdvdSZceRGd9TYcjwCAn4s8GKGQRKeQSQLL8Pusc/PmYLISIomkOHhZB3lp1bfniKC6uf6RwhRFNfX38ZeqVhRNgagK2hxOo67Ev7uUX1NIUWw/ROq/RC7bG58DD1uhUOX5Bm5wVMSmf3Pv7s/z2t7ZGcX8YMnW
.....
....
....
....
/uPooWhUQpcG8Q8APvfEKMp0BVJATh4Gk4qkgnR4CHyefUYCDHmWZnpuCj1ra7Kn5LV7AdWG1+C7bl7hSmpxE8StLfEwe72zuPJsI86LekxhJJpsorjvSFYklJhGizSs4uuyRP2Dqayx1qFkFJe26AiWb86ld2gQ2Vwj3ozgHX2OMp60jQM8ZVM9VuroOEmiOZ5gkfhFFIf5Y78WcZU2ScSCAHVavNAIsfX+i01iNxwnXtrmrA9ctlZGuZZRR3TklAXQz4M69SFFYuo5e7OsJ2/Tr9K7AhUHA8kwR+BxPLFGRlJpyhmLUuPartDKRBAp2BwU1XsFiIs8Y6pn9wkRJaeeDdpUt9zPCYMHaHtXQvFMe+d96J4HMPvfLgGnd55Hz7GKsglX9su7XpuAqzkuPM5u6GY5LLAtXUOqE2Wm855sLRW1kbHnX8pAfIe9wQbfUQ9wsfCh66w1duce0vyuj4A7A8HC4V8+U2qXjwMIhLpfO5phfUy4drlUzBzx9ZMK+JUYP6kszj61M09psoYggUBXMYCfg6oWJSjA8=
  description: Private key for the entire CloudStack
default_value: NULL
      updated: NULL
        scope: NULL
   is_dynamic: 0
1 row in set (0.00 sec)

# Secondary Storage VM (note that the key is equal to the management server 
database entry ssh.publickey)
root@s-252-VM:~# cat /root/.ssh/authorized_keys
XiVYy4PpuS4nKoQZRsj+3VQr0dxXFt+s15ZVXiIbaacJ5RsLwlSEO75kbPaL7pDAhyZbcCtC3ICSDUAeJysZ958cTVrl390Lk0UtRoLOUKlNIuOEVzdJpZDmsrm90ZK+OYMaHKf2cVz3xekZv/BS5Kpr2dnieUNZlQlgCXgldwENNZ7+MSlPRLiJ+7/SEBtfn9VTUkXvedhbBtIh3fxexOR/xN+zGRTyotlYANxiypklrlZp6/364JfXoeQviE6aVzsVDboALW+ov6gFj8ziO7vhoUN9M6rJohGnPawjgMCBasVCjWqoFaezKlf/0vEDf3NkKOoi1kqvSDGOumCFyQ7+9BOBq8qdqnP3i4e4aJ3vLdVWlRKRaUps08xOP8+j36L9lrO0YAkGXH+bAQXzQMqIFLvjRGX+zOnEikRQ2Zv0JJSlAvftF13BQrTt1SV/ttDa0qo7fzqF1WUijZb/PcKLGUeC30U9kG4JLmrHcuaoe43I08qkGJEofUetZGQLIYhsUqXiVSflGEGfJTWJgMGKAVbGUF5a

root@s-252-VM:~# ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 0e:00:a9:fe:00:18
          inet addr:169.254.0.24  Bcast:169.254.255.255  Mask:255.255.0.0

#KVM Hosts
root@kvm704:~/.ssh#<mailto:root@kvm704:%7E/.ssh#> cat 
/root/.ssh/id_rsa.pub.cloud
XiVYy4PpuS4nKoQZRsj+3VQr0dxXFt+s15ZVXiIbaacJ5RsLwlSEO75kbPaL7pDAhyZbcCtC3ICSDUAeJysZ958cTVrl390Lk0UtRoLOUKlNIuOEVzdJpZDmsrm90ZK+OYMaHKf2cVz3xekZv/BS5Kpr2dnieUNZlQlgCXgldwENNZ7+MSlPRLiJ+7/SEBtfn9VTUkXvedhbBtIh3fxexOR/xN+zGRTyotlYANxiypklrlZp6/364JfXoeQviE6aVzsVDboALW+ov6gFj8ziO7vhoUN9M6rJohGnPawjgMCBasVCjWqoFaezKlf/0vEDf3NkKOoi1kqvSDGOumCFyQ7+9BOBq8qdqnP3i4e4aJ3vLdVWlRKRaUps08xOP8+j36L9lrO0YAkGXH+bAQXzQMqIFLvjRGX+zOnEikRQ2Zv0JJSlAvftF13BQrTt1SV/ttDa0qo7fzqF1WUijZb/PcKLGUeC30U9kG4JLmrHcuaoe43I08qkGJEofUetZGQLIYhsUqXiVSflGEGfJTWJgMGKAVbGUF5aroot@kvm704

root@kvm704:~/.ssh#<mailto:root@kvm704:%7E/.ssh#> ssh -p 3922 -q -o 
StrictHostKeyChecking=no -i /root/.ssh/id_rsa.cloud 
root@169.254.0.24<mailto:root@169.254.0.24>
Enter passphrase for key '/root/.ssh/id_rsa.cloud':

# Router which was not redeployed
root@r-191-VM:~# cat .ssh/authorized_keys
ssh-rsa 
AAAAB3NzaC1yc2EAAAADAQABAAABAQDoZcfuh/R2i/n69hFBJEfKKtTQQAs7iOIC9PxFUWrFONk9m4R8Atj8o5wLyHflhe10iMHUJTGH84/J4min1r7zU4xGqzOBR3CKuz6dowmC+IQ38iyJfTOC9L1eIl8AebmKNwSSgjS6nc6sK13iFpgr8mSIr7ExcXPxYgWNgbQAm/HvLVc8Ivf2FZR8f26i3EyjsLVKpHoyvVMeVdoAHBdQB/NiUslAz2fz3cPXxVSxpZobd7iy/uJNVK1CHvqhGYLrXym9cjfM0BOUhN80NN9SvAiyj9D7PHiwcDBzMSSPljxwF6rPY7BXHq+7AVvzVPTJYEuBVQllYn1RrjCS/Fdr
 cloud@cloudstack

If we understand correctly the public and private key are encrypted in the 
management database. Based on the Source Code
master# ag ssh.publickey 
server/src/com/cloud/server/ConfigurationServerImpl.java
761:                            "VALUES ('Hidden','DEFAULT', 
'management-server','ssh.publickey', '" + DBEncryptionUtil.encrypt(publicKey) +
855:        String pubKey = _configDao.getValue("ssh.publickey");

- It looks like the rsa key is still encrypted on the KVM Server and on the 
System VMs?
- Could this issue be related with the keystore cleanup?
- We are running out of ideas....arrr

regards
Martin





On 08/31/2016 07:31 PM, Rohit Yadav wrote:

Hi Martin,


Looks like something went wrong with the CloudStack generated public/private 
keys. You should not be prompted for any password when trying to ssh to 
systemvms on port 3922. Can you try to destroy the VRs or create a new network 
which would create a new VR.


Regarding the /bin/bash not found issue, that looks fishy. /bin/bash should be 
available on both Ubuntu 14.04 based KVM hosts and also on systemvms (which are 
Debian7 based).


Regards.

________________________________
From: martin kolly <martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch>
Sent: 29 August 2016 20:10:32
To: dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org>
Subject: Re: CS 4.9 NIO Selector wait time PR-1601


We have done more investigations. An error happens when the script
/usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh is
executed:

MGMT LOGS:
2016-08-29 16:00:11,342 DEBUG [c.c.a.t.Request]
(AgentManager-Handler-12:null) (logid:) Seq 8-1837187172990452475:
Processing:  { Ans: , MgmtId: 90520741415395, via: 8, Ver: v1, Flags:
10,
[{"com.cloud.agent.api.GetRouterAlertsAnswer":{"result":false,"details":"/usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh:
1: /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh:
h!/bin/bash: not found","wait":0}}] }
2016-08-29 16:00:11,345 DEBUG [c.c.a.t.Request]
(RouterStatusMonitor-1:ctx-9cdc1e33) (logid:5f04bf82) Seq
8-1837187172990452475: Received:  { Ans: , MgmtId: 90520741415395, via:
8(kvm702), Ver: v1, Flags: 10, { GetRouterAlertsAnswer } }
*2016-08-29 16:00:11,345 DEBUG [c.c.a.m.AgentManagerImpl]
(RouterStatusMonitor-1:ctx-9cdc1e33) (logid:5f04bf82) Details from
executing class com.cloud.agent.api.routing.GetRouterAlertsCommand:
/usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh: 1:
/usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh:
h!/bin/bash: not found*
2016-08-29 16:00:11,345 WARN
[c.c.n.r.VirtualNetworkApplianceManagerImpl]
(RouterStatusMonitor-1:ctx-9cdc1e33) (logid:5f04bf82) Unable to get
alerts from router r-221-VM
/usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh: 1:
/usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh:
h!/bin/bash: not found
2016-08-29 16:00:11,411 INFO  [o.a.c.e.o.NetworkOrchestrator]
(Network-Scavenger-1:ctx-64537bea) (logid:0c5d8104)
NetworkGarbageCollector uses '600' seconds for GC interval.
2016-08-29 16:00:11,416 DEBUG [o.a.c.e.o.NetworkOrchestrator]
(Network-Scavenger-1:ctx-64537bea) (logid:0c5d8104) We found network 219
to be free for the first time.  Adding it to the list: 1472479211
2016-08-29 16:00:12,392 DEBUG [c.c.s.StatsCollector]
(StatsCollector-6:ctx-c1aeb528) (logid:a364cc05) StorageCollector is
running...
2016-08-29 16:00:12,403 DEBUG [c.c.h.o.r.Ovm3HypervisorGuru]
(StatsCollector-6:ctx-c1aeb528) (logid:a364cc05)
getCommandHostDelegation: class com.cloud.agent.api.GetStorageStatsCommand

For debugging we modified the "router_proxy.sh" slightly on the KVM Host:

kvm# grep -A 2 DEBUG
/usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh -n
46:# DEBUG CS Upgrade
*47-echo "$(date):  ssh -p 3922 -q -o StrictHostKeyChecking=no -i $cert
root@$domRIp /opt/cloud/bin/$script $*" >> /tmp/router.txt*
48-ssh -p 3922 -q -o StrictHostKeyChecking=no -i $cert root@$domRIp
"/opt/cloud/bin/$script $*"

kvm# tail /tmp/router.txt
Mon Aug 29 16:10:11 CEST 2016:  ssh -p 3922 -q -o
StrictHostKeyChecking=no -i /root/.ssh/id_rsa.cloud 
root@169.254.2.67<mailto:root@169.254.2.67>
/opt/cloud/bin/netusage.sh -g
Mon Aug 29 16:10:11 CEST 2016:  ssh -p 3922 -q -o
StrictHostKeyChecking=no -i /root/.ssh/id_rsa.cloud 
root@169.254.2.67<mailto:root@169.254.2.67>
/opt/cloud/bin/netusage.sh -g

On the router with IP 169.254.2.67 we see the failed logins as well:

root@r-221-VM:~# tail -n 2 /var/log/auth.log
Aug 29 14:10:11 r-221-VM sshd[7074]: Connection closed by 169.254.0.1
[preauth]
Aug 29 14:10:11 r-221-VM sshd[7075]: Connection closed by 169.254.0.1
[preauth]

These commands did not succeed. If we execute the command manually we
get prompted for the key passphrase.

kvm# ssh -p 3922  -o StrictHostKeyChecking=no -i /root/.ssh/id_rsa.cloud
root@169.254.2.67<mailto:root@169.254.2.67>
Enter passphrase for key '/root/.ssh/id_rsa.cloud':

can someone confirm that the rsa key is protected with a passhrase? this
looks suspicious to us...

regards
martin







On 08/29/2016 10:09 AM, martin kolly wrote:


thanks Simon and Rohit for the valuable inputs! After applying the
following procedure the ssl errors are gone and host state is UP.

# service cloudstack-management stop

mysql> delete from configuration  where name = "ssl.keystore" ;
# mv /etc/cloudstack/management/cloudmanagementserver.keystore
/etc/cloudstack/management/cloudmanagementserver.keystore.old

# service cloudstack-management start

# file /etc/cloudstack/management/cloudmanagementserver.keystore
/etc/cloudstack/management/cloudmanagementserver.keystore: Java KeyStore
# file /etc/cloudstack/management/cloudmanagementserver.keystore.old
/etc/cloudstack/management/cloudmanagementserver.keystore.old: data

#java -version
java version "1.7.0_111"

However new routers are not deployed, we still see entries that KVM
hosts are unreachable (logs underneath).

After starting a new router it is shown with  "virsh list" on the KVM:
kvm# virsh list | grep r-233-VM
 111   r-233-VM                       running

After some seconds the router is deleted,

kvm# virsh list | grep r-233-VM
#

kvm#virsh  -v
1.2.2

(libvirt was already restarted)



############ Logs/Info KVM  #################

kvm #  dpkg -l  | grep cloudstack
ii  cloudstack-agent
4.9.0                                 all          CloudStack agent
ii  cloudstack-common
4.9.0                                 all          A common package
which contains files which are shared by several CloudStack packages

kvm# df -kh | grep cloud
10.100.12.9:/export/cloud           188G  137G   51G  73%
/mnt/5db02c19-1e8f-3591-bdb4-02608362521e

kvm# tail -f /var/log/cloudstack/agent/agent.log
...
2016-08-29 06:55:06.203+0000: 4425: error :
qemuMonitorFindBalloonObjectPath:1032 : internal error: Cannot determine
balloon device path
2016-08-29 06:55:06.223+0000: 4426: error :
qemuMonitorFindBalloonObjectPath:1032 : internal error: Cannot determine
balloon device path
2016-08-29 06:55:24.062+0000: 4425: warning : qemuDomainObjTaint:1628 :
Domain id=112 name='r-233-VM' uuid=d9dcd37a-242d-43ac-a18e-79a4bfa86ebb
is tainted: high-privileges
2016-08-29 06:55:49.066+0000: 4423: error : qemuMonitorIO:656 : internal
error: End of file from monitor




############ Logs/Info Management Server ########

# dpkg -l  | grep cloudstack
ii  cloudstack-agent
4.9.0                                 all          CloudStack agent
ii  cloudstack-common
4.9.0                                 all          A common package
which contains files which are shared by several CloudStack packages
ii  cloudstack-management
4.9.0                                 all          CloudStack server library
ii  cloudstack-usage
4.9.0                                 all          CloudStack usage monitor

# mysql -u root cloud  -e "select id,name,path from cloud.storage_pool
where pool_type='Filesystem'"
+----+----------------------+-------------------------+
| id | name                 | path                    |
+----+----------------------+-------------------------+
|  1 | kvm704 Local Storage | /var/lib/libvirt/images |
|  2 | kvm701 Local Storage | /var/lib/libvirt/images |
|  3 | kvm702 Local Storage | /var/lib/libvirt/images |
|  4 | kvm703 Local Storage | /var/lib/libvirt/images |
+----+----------------------+-------------------------+

#tail -f /var/log/cloudstack/management/management-server.log | grep -v
DEBUG
...
2016-08-29 08:55:42,484 WARN  [o.a.c.alerts]
(Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739 ctx-68467c7a)
(logid:337b354e)  alertType:: 9 // dataCenterId:: 1 // podId:: 1 //
clusterId:: null // message:: Command:
com.cloud.agent.api.GetDomRVersionCommand failed while starting virtual
router
2016-08-29 08:55:42,494 ERROR
[c.c.n.r.VirtualNetworkApplianceManagerImpl]
(Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739 ctx-68467c7a)
(logid:337b354e) GetDomRVersionCmd failed
2016-08-29 08:55:42,495 WARN
[c.c.n.r.VirtualNetworkApplianceManagerImpl]
(Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739 ctx-68467c7a)
(logid:337b354e) Command: com.cloud.agent.api.GetDomRVersionCommand
failed while starting virtual router
2016-08-29 08:55:42,495 INFO  [c.c.v.VirtualMachineManagerImpl]
(Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739 ctx-68467c7a)
(logid:337b354e) The guru did not like the answers so stopping
VM[DomainRouter|r-233-VM]
2016-08-29 08:55:49,184 INFO  [o.a.c.f.j.i.AsyncJobManagerImpl]
(AsyncJobMgr-Heartbeat-1:ctx-5bd7c179) (logid:86c0376e) Begin cleanup
expired async-jobs
2016-08-29 08:55:49,192 INFO  [o.a.c.f.j.i.AsyncJobManagerImpl]
(AsyncJobMgr-Heartbeat-1:ctx-5bd7c179) (logid:86c0376e) End cleanup
expired async-jobs
2016-08-29 08:55:50,861 ERROR [c.c.v.VirtualMachineManagerImpl]
(Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739 ctx-68467c7a)
(logid:337b354e) Failed to start instance VM[DomainRouter|r-233-VM]
com.cloud.utils.exception.ExecutionException: Unable to start
VM:d9dcd37a-242d-43ac-a18e-79a4bfa86ebb due to error in finalizeStart,
not retrying
    at
com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1084)
    at
com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4592)
    at sun.reflect.GeneratedMethodAccessor290.invoke(Unknown Source)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at
com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:107)
    at
com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachineManagerImpl.java:4753)
    at com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:102)
    at
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:554)
    at
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
    at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
    at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
    at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
    at
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
    at
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:502)
    at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
2016-08-29 08:55:51,136 ERROR [c.c.v.VmWorkJobHandlerProxy]
(Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739 ctx-68467c7a)
(logid:337b354e) Invocation exception, caused by:
com.cloud.exception.AgentUnavailableException: Resource [Host:8] is
unreachable: Host 8: Unable to start instance due to Unable to start
VM:d9dcd37a-242d-43ac-a18e-79a4bfa86ebb due to error in finalizeStart,
not retrying
2016-08-29 08:55:51,136 INFO  [c.c.v.VmWorkJobHandlerProxy]
(Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739 ctx-68467c7a)
(logid:337b354e) Rethrow exception
com.cloud.exception.AgentUnavailableException: Resource [Host:8] is
unreachable: Host 8: Unable to start instance due to Unable to start
VM:d9dcd37a-242d-43ac-a18e-79a4bfa86ebb due to error in finalizeStart,
not retrying
2016-08-29 08:55:51,137 ERROR [c.c.v.VmWorkJobDispatcher]
(Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739) (logid:337b354e)
Unable to complete AsyncJobVO {id:2739, userId: 2, accountId: 2,
instanceType: null, instanceId: null, cmd: com.cloud.vm.VmWorkStart,
cmdInfo:
rO0ABXNyABhjb20uY2xvdWQudm0uVm1Xb3JrU3RhcnR9cMGsvxz73gIAC0oABGRjSWRMAAZhdm9pZHN0ADBMY29tL2Nsb3VkL2RlcGxveS9EZXBsb3ltZW50UGxhbm5lciRFeGNsdWRlTGlzdDtMAAljbHVzdGVySWR0ABBMamF2YS9sYW5nL0xvbmc7TAAGaG9zdElkcQB-AAJMAAtqb3VybmFsTmFtZXQAEkxqYXZhL2xhbmcvU3RyaW5nO0wAEXBoeXNpY2FsTmV0d29ya0lkcQB-AAJMAAdwbGFubmVycQB-AANMAAVwb2RJZHEAfgACTAAGcG9vbElkcQB-AAJMAAlyYXdQYXJhbXN0AA9MamF2YS91dGlsL01hcDtMAA1yZXNlcnZhdGlvbklkcQB-AAN4cgATY29tLmNsb3VkLnZtLlZtV29ya5-ZtlbwJWdrAgAESgAJYWNjb3VudElkSgAGdXNlcklkSgAEdm1JZEwAC2hhbmRsZXJOYW1lcQB-AAN4cAAAAAAAAAACAAAAAAAAAAIAAAAAAAAA6XQAGVZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGwAAAAAAAAAAHBwcHBwcHBwc3IAEWphdmEudXRpbC5IYXNoTWFwBQfawcMWYNEDAAJGAApsb2FkRmFjdG9ySQAJdGhyZXNob2xkeHA_QAAAAAAADHcIAAAAEAAAAAF0AA5SZXN0YXJ0TmV0d29ya3QAP3JPMEFCWE55QUJGcVlYWmhMbXhoYm1jdVFtOXZiR1ZoYnMwZ2NvRFZuUHJ1QWdBQldnQUZkbUZzZFdWNGNBRXhw,
cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0,
result: null, initMsid: 90520741415395, completeMsid: null, lastUpdated:
null, lastPolled: null, created: Mon Aug 29 08:55:21 CEST 2016}, job
origin:2738
com.cloud.exception.AgentUnavailableException: Resource [Host:8] is
unreachable: Host 8: Unable to start instance due to Unable to start
VM:d9dcd37a-242d-43ac-a18e-79a4bfa86ebb due to error in finalizeStart,
not retrying
    at
com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1120)
    at
com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4592)
    at sun.reflect.GeneratedMethodAccessor290.invoke(Unknown Source)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at
com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:107)
    at
com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachineManagerImpl.java:4753)
    at com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:102)
    at
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:554)
    at
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
    at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
    at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
    at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
    at
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
    at
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:502)
    at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: com.cloud.utils.exception.ExecutionException: Unable to
start  VM:d9dcd37a-242d-43ac-a18e-79a4bfa86ebb due to error in
finalizeStart, not retrying
    at
com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1084)
    at
com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4592)
    at sun.reflect.GeneratedMethodAccessor290.invoke(Unknown Source)
    ... 17 more
2016-08-29 08:55:51,168 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
(Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739) (logid:337b354e)
Remove job-2739 from job monitoring
2016-08-29 08:55:53,197 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
(Work-Job-Executor-113:ctx-6411b64b job-2738/job-2740) (logid:db064d99)
Add job-2740 into job monitoring
2016-08-29 08:55:53,218 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
(Work-Job-Executor-113:ctx-6411b64b job-2738/job-2740) (logid:337b354e)
Remove job-2740 from job monitoring
2016-08-29 08:55:53,247 ERROR [c.c.a.ApiAsyncJobDispatcher]
(API-Job-Executor-68:ctx-d2eb7557 job-2738) (logid:337b354e) Unexpected
exception while executing
org.apache.cloudstack.api.command.admin.router.StartRouterCmd
com.cloud.exception.AgentUnavailableException: Resource [Host:8] is
unreachable: Host 8: Unable to start instance due to Unable to start
VM:d9dcd37a-242d-43ac-a18e-79a4bfa86ebb due to error in finalizeStart,
not retrying
    at
com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1120)
    at
com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4592)
    at sun.reflect.GeneratedMethodAccessor290.invoke(Unknown Source)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at
com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:107)
    at
com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachineManagerImpl.java:4753)
    at com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:102)
    at
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:554)
    at
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
    at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
    at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
    at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
    at
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
    at
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:502)
    at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: com.cloud.utils.exception.ExecutionException: Unable to
start  VM:d9dcd37a-242d-43ac-a18e-79a4bfa86ebb due to error in
finalizeStart, not retrying
    at
com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1084)
    at
com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4592)
    at sun.reflect.GeneratedMethodAccessor290.invoke(Unknown Source)
    ... 17 more
2016-08-29 08:55:53,256 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
(API-Job-Executor-68:ctx-d2eb7557 job-2738) (logid:337b354e) Remove
job-2738 from job monitoring




regards
Martin




On 08/26/2016 08:47 PM, Rohit Yadav wrote:


Hi Martin,


I checked my openssl connect output and my management server on port 8250 was 
indeed returning valid certificates. On further investigation, when connections 
to management server (NioServer, handled by Link class that reads 
/etc/cloudstack/management/cloudmanagementserver.keystore) are received it uses 
a keystore file (/etc/cloudstack/management/cloudmanagementserver.keystore). 
Please check if that's valid, or backup your keystore file and do what Simon 
has suggested.



For reference, here the output from my 4.9.0 mgmt server running on Ubuntu 
14.04.4:


$ openssl s_client -tls1 -connect 192.168.1.11:8250
CONNECTED(00000003)
depth=0 C = Unknown, O = bluebox, OU = bluebox, CN = Cloudstack User
verify error:num=18:self signed certificate
verify return:1
depth=0 C = Unknown, O = bluebox, OU = bluebox, CN = Cloudstack User
verify return:1
---
Certificate chain
 0 s:/C=Unknown/O=bluebox/OU=bluebox/CN=Cloudstack User
   i:/C=Unknown/O=bluebox/OU=bluebox/CN=Cloudstack User
---
Server certificate
-----BEGIN CERTIFICATE-----
MIIDPzCCAiegAwIBAgIEFHHX2DANBgkqhkiG9w0BAQsFADBQMRAwDgYDVQQGEwdV
bmtub3duMRAwDgYDVQQKEwdibHVlYm94MRAwDgYDVQQLEwdibHVlYm94MRgwFgYD
VQQDEw9DbG91ZHN0YWNrIFVzZXIwHhcNMTUwNTA2MTM0MjMzWhcNMjUwNTAzMTM0
MjMzWjBQMRAwDgYDVQQGEwdVbmtub3duMRAwDgYDVQQKEwdibHVlYm94MRAwDgYD
VQQLEwdibHVlYm94MRgwFgYDVQQDEw9DbG91ZHN0YWNrIFVzZXIwggEiMA0GCSqG
SIb3DQEBAQUAA4IBDwAwggEKAoIBAQC0wN8kfOJMzwlbrOnBj/jjvjjIwDVpYLtH
WoKkNB+rzzKEUxaYwaQxe6E3M536ZuqcaJBIqcYPwTIkWyulvuHuJpSQak4VbuDV
f7dqt5RacLFT0jUciqTvL5QDCrk0uNugKkWgEvNtokVGSBwLPVEcdcGWpku1EpeH
vMYmpOkcWgbC8Z9D7QTlVw6oEWbPAtKr+gDrXdOFnpPPI45rteatIIgKm1Q6JjZM
qrUKfqt7s8ts6ZgdAN2WmtieSsnUX1su9SJMYg2J8LK7UJeGqiNtE+g944GPqtnW
aDldjharq54e79ug2ktxw29I3ulpRD/vgxwZmcPJrePUKwY91KEjAgMBAAGjITAf
MB0GA1UdDgQWBBSUJvLY8RL/1fTVqj1rT8136da4yzANBgkqhkiG9w0BAQsFAAOC
AQEAgDdFIgLvOH/UgRp2nnFUVcMp+uchSLj8CbCkukJBrUwrmJHp3Os+H1ggk8Vt
j3conj06zJBNN/E0J8pcpagE1aR+l4R8WxF3g/Oc7bNyrUlkGSQ82vavg9sEkwHY
eQY/4wj8CprICs9JilgZ6keeWNWgAW1goLZSzGVwz5eE0lPuc2Dg3laR5RsuTxie
dgQhpbOx3UZun+dhuP5NUHc+KWyrNvSZNN8FruO602KWZwm0Hndl7RVbkNEd0kxq
FhFK4Scc2HBrKMUrPTzO1nGCgR1gA015C2MFfmjeW49VTi95WnY8DDG2euUYAtpl
lLVuNxJxq7eDJfP/M9kxSIKgrQ==
-----END CERTIFICATE-----
subject=/C=Unknown/O=bluebox/OU=bluebox/CN=Cloudstack User
issuer=/C=Unknown/O=bluebox/OU=bluebox/CN=Cloudstack User
---
No client certificate CA names sent
Server Temp Key: ECDH, P-256, 256 bits
---
SSL handshake has read 1325 bytes and written 331 bytes
---
New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES256-SHA
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
    Protocol  : TLSv1
    Cipher    : ECDHE-RSA-AES256-SHA
    Session-ID: 57C08E5D841D9B74AB533A0000BDBE4275424640B5B7D1985D438B275598436F
    Session-ID-ctx:
    Master-Key: 
E1DE7D7464766DCECD82088561A7D928519CA20BA91FDF4AA38BB2E51AB7C27D3F1ABA5788ADED2AD49FE8597EC4A344
    Key-Arg   : None
    PSK identity: None
    PSK identity hint: None
    SRP username: None
    Start Time: 1472237149
    Timeout   : 7200 (sec)
    Verify return code: 18 (self signed certificate)
---




Regards.

________________________________
From: martin kolly <martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch>
Sent: 26 August 2016 20:36:02
To: dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org>
Subject: Re: CS 4.9 NIO Selector wait time PR-1601


good point, thanks Simon! with openssl we receive a response on port 8250.

# telnet 10.100.12.10 8250
Trying 10.100.12.10...
Connected to 10.100.12.10.
Escape character is '^]'.
Connection closed by foreign host.

# nc -zv 10.100.12.10 8250
Connection to 10.100.12.10 8250 port [tcp/*] succeeded!

# openssl   s_client -tls1  -connect 10.100.12.10:8250
CONNECTED(00000003)
write:errno=104
---
no peer certificate available
---
No client certificate CA names sent
---
SSL handshake has read 0 bytes and written 0 bytes
---
New, (NONE), Cipher is (NONE)
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
SSL-Session:
    Protocol  : TLSv1
    Cipher    : 0000
    Session-ID:
    Session-ID-ctx:
    Master-Key:
    Key-Arg   : None
    PSK identity: None
    PSK identity hint: None
    SRP username: None
    Start Time: 1472223447
    Timeout   : 7200 (sec)
    Verify return code: 0 (ok)
---


On 08/26/2016 04:49 PM, Simon Weller wrote:


Martin,


Are you able to actually telnet to 8250 from the host to the mgmt server?


- Si


________________________________
From: martin kolly <martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch>
Sent: Friday, August 26, 2016 9:41 AM
To: dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org>
Subject: Re: CS 4.9 NIO Selector wait time PR-1601

Hi Rohit

We highly appreciate your efforts! Unfortunately it still does not work.
- ulimit is increased on mgmt server
- jar file replaced
- we confirm that cloudstack-agent 4.9.0 is installed

MGMT Server
# wget 
https://github.com/rhtyd/cloudstack/releases/download/4.9.0-nioinbound/cloud-utils-4.9.0.jar
 -O cloud-utils-4.9.0.jar.patch
# md5sum cloud-utils-4.9.0.jar.patch
c4496f42cc6741f562ac645c3a3d8a0c  cloud-utils-4.9.0.jar.patch
# md5sum 
/usr/share/cloudstack-management/webapps/client/WEB-INF/lib/cloud-utils-4.9.0.jar
c4496f42cc6741f562ac645c3a3d8a0c  
/usr/share/cloudstack-management/webapps/client/WEB-INF/lib/cloud-utils-4.9.0.jar

# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 64109
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 10240
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 64109
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

KVM Server
# wget 
https://github.com/rhtyd/cloudstack/releases/download/4.9.0-nioinbound/cloud-utils-4.9.0.jar
 -O cloud-utils-4.9.0.jar.patch
# md5sum cloud-utils-4.9.0.jar.patch
c4496f42cc6741f562ac645c3a3d8a0c  cloud-utils-4.9.0.jar.patch
# md5sum /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar
c4496f42cc6741f562ac645c3a3d8a0c  
/usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar

# apt-cache policy cloudstack-agent
cloudstack-agent:
  Installed: 4.9.0
  Candidate: 4.9.0
  Version table:
 *** 4.9.0 0
        500 http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/ ./ 
Packages
        100 /var/lib/dpkg/status

The logs are attached. By the way: the error message "Caught the Exception in 
VmIpFetchTask" was already there with 4.8 release.

Thanks
Martin

On 08/25/2016 07:44 PM, Rohit Yadav wrote:

Hi Martin,


Thanks for sharing. Alright, I'm not sure what's causing issue but based on the 
logs seems like only KVM agents are having issues while connecting to mgmt 
server as I don't see any Nio related exceptions in the management server logs.


I could not see the cloudstack-agent version in the logs, I'm assuming that 
they were all upgraded to 4.9.0, and there are no conflicting jars at 
/usr/share/cloudstack-agent/lib.


First, can you make sure mgmt server has enough ulimit. I found that 
Ubuntu/Debian's init.d script don't override this while CentOS initd/systemd 
script sets ulimit. On your mgmt server, edit /etc/init.d/cloudstack-management 
and add ulimit -n 10240 just before the mgmt server is started in the 'state' 
section (for me it was at around line #147 where it logs a message that it's 
starting the cloudstack-management server).


Next, if this still does not solve the issue -- I created a special 
cloud-utils.jar for you that you need to place on your mgmt server and on the 
KVM agents and restart the mgmt server. This will increase verbosity of the 
error while reduce the Nio polling loop timeout (from 100ms to 10ms). On KVM 
agents, the error from the logs is that during SSL handshake inbound 
connection/stream gets closed, and we want to know the exception message. 
Please get the jar from here:

https://github.com/rhtyd/cloudstack/releases/tag/4.9.0-nioinbound and place 
them at:

/usr/share/cloudstack-agent/lib/ (on kvm host)

/usr/share/cloudstack-management/webapps/client/WEB-INF/lib/ (on mgmt server 
host)


Let me know what worked for you, and if it still failed can you share the mgmt 
server and agent logs once again. Thanks.


Regards.

________________________________
From: martin kolly 
<martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch>
Sent: 25 August 2016 20:50:08
To: 
dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org>
Subject: Re: CS 4.9 NIO Selector wait time PR-1601

Hi Rohit

We are running java version 1.7.0.111 on KVM and management server.
mgmt# java -version
java version "1.7.0_111"
kvm# java -version
java version "1.7.0_111"

We get the same error message. Attached are the logs with TRACE enabled.

"success consists of going from failure to failure without loss of enthusiasm."

regards
martin

On 08/25/2016 02:02 PM, Rohit Yadav wrote:

Hi Martin,


Thanks for sharing, on the surface there does not seem to be any issue in 
configuration causing the failures. I'm personally running KVM and Ubuntu hosts 
based env without issues, I'm on Ubuntu 14.04.4 (Linux bluebox 
3.16.0-45-generic #60~14.04.1-Ubuntu) and java 1.7.0_79. Can you try upgrading 
your JRE7 to latest (openjdk-7-jre, 7u111-2.6.7-0ubuntu0.14.04.3) on all mgmt 
server and kvm hosts?


If upgrading your JRE does not help, can you increase the logging verbosity for 
both the agent and management server (in /etc/cloudstack/{agent, management} 
there would be a log4j file, edit that and replace DEBUG/INFO with TRACE for 
class/keys com.cloud and org.apache.cloudstack) and re-share logs when the 
failures occur? I want to see what additional information we can get from logs 
when it tries to connect to host 10.100.12.10 on port: 8250.


Regards.

________________________________
From: martin kolly 
<martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch>
Sent: 25 August 2016 17:11:06
To: 
dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org>
Subject: Re: CS 4.9 NIO Selector wait time PR-1601


@Simon: We have one management server with local DB. KVMs connect
directly to the management server without any security/loadbalancing
device.

Thanks
Martin

On 08/25/2016 12:41 PM, Simon Weller wrote:


Martin,

Can you provide more detail about your haproxy setup?
Are you running it on separate servers, or on the management server itself?

- Si

Simon Weller/ENA
(615) 312-6068

rohit.ya...@shapeblue.com<mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com>
www.shapeblue.com<http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com>
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue




rohit.ya...@shapeblue.comĀ 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 


-----Original Message-----
From: martin kolly 
[martin.ko...@senselan.ch<mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch>]
Received: Thursday, 25 Aug 2016, 5:04AM
To: Rohit Yadav 
[rohit.ya...@shapeblue.com<mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com>];
 
dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org>
 
[dev@cloudstack.apache.org<mailto:dev@cloudstack.a%0Apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org>]
Subject: Re: CS 4.9 NIO Selector wait time PR-1601


thanks for your reply.

This morning we repeated the upgrade process from 4.8 to 4.9 with the
following repository:
http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/.
<http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/><http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/><http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/><http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/><http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/><http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/><http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/><http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/>
Unfortunately we run into the same issue:

/2016-08-25 09:49:00,660 INFO  [utils.nio.NioClient] (main:null)
(logid:) Connecting to 10.100.12.10:8250//
//2016-08-25 09:49:00,668 WARN  [utils.nio.Link] (main:null) (logid:)
This SSL engine was forced to close inbound due to end of stream.//
//2016-08-25 09:49:00,668 ERROR [utils.nio.NioClient] (main:null)
(logid:) SSL Handshake failed while connecting to host: 10.100.12.10
port: 8250//
//2016-08-25 09:49:00,668 ERROR [utils.nio.NioConnection] (main:null)
(logid:) Unable to initialize the threads.//
//java.io.IOException: SSL Handshake failed while connecting to host:
10.100.12.10 port: 8250//
//    at com.cloud.utils.nio.NioClient.init(NioClient.java:67)//
//    at com.cloud.utils.nio.NioConnection.start(NioConnection.java:88)//
//    at com.cloud.agent.Agent.start(Agent.java:237)//
//    at com.cloud.agent.AgentShell.launchAgent(AgentShell.java:399)//
//    at
com.cloud.agent.AgentShell.launchAgentFromClassInfo(AgentShell.java:367)//
//    at com.cloud.agent.AgentShell.launchAgent(AgentShell.java:351)//
//    at com.cloud.agent.AgentShell.start(AgentShell.java:456)//
//    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)//
//    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)//
//    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)//
//    at java.lang.reflect.Method.invoke(Method.java:606)//
//    at
org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:243)//
//2016-08-25 09:49:00,669 INFO  [utils.exception.CSExceptionErrorCode]
(main:null) (logid:) Could not find exception:
com.cloud.utils.exception.NioConnectionException in error code list for
exceptions//
//2016-08-25 09:49:00,669 WARN  [cloud.agent.Agent] (main:null) (logid:)
NIO Connection Exception
com.cloud.utils.exception.NioConnectionException: SSL Handshake failed
while connecting to host: 10.100.12.10 port: 8250//
//2016-08-25 09:49:00,670 INFO  [cloud.agent.Agent] (main:null) (logid:)
Attempted to connect to the server, but received an unexpected
exception, trying again.../

*KVM Hosts:
*/# java -version
java version "1.7.0_95"
OpenJDK Runtime Environment (IcedTea 2.6.4) (7u95-2.6.4-0ubuntu0.14.04.1)
OpenJDK 64-Bit Server VM (build 24.95-b01, mixed mode)

# dpkg --get-selections | grep -e 'jdk' -e 'java'
ca-certificates-java                install
java-common                    install
libcommons-daemon-java                install
openjdk-7-jre-headless:amd64            install
tzdata-java                    install

# apt-cache policy cloudstack-agent
cloudstack-agent:
  Installed: 4.9.0
  Candidate: 4.9.0
  Version table:
 *** 4.9.0 0
        500
http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/ ./ Packages
        100 /var/lib/dpkg/status

# find /usr/share/ -name "cloud-utils*.jar"
/usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar
# md5sum /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar
a8de7306d7c80b5a73e93b83afdd119f
/usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar


/*Management Server:
*/# java -version//
//java version "1.7.0_95"//
//OpenJDK Runtime Environment (IcedTea 2.6.4)
(7u95-2.6.4-0ubuntu0.14.04.1)//
//OpenJDK 64-Bit Server VM (build 24.95-b01, mixed mode)//
//
//# dpkg --get-selections | grep -e 'jdk' -e 'java'//
//ca-certificates-java                install//
//java-common                    install//
//libcommons-collections3-java            install//
//libcommons-daemon-java                install//
//libcommons-dbcp-java                install//
//libcommons-pool-java                install//
//libecj-java                    install//
//libgeronimo-jta-1.1-spec-java            install//
//libmysql-java                    install//
//libservlet2.5-java                install//
//libtomcat6-java                    install//
//openjdk-7-jre-headless:amd64            install//
//tzdata-java                    install//
//
//# apt-cache policy cloudstack-management//
//cloudstack-management://
//  Installed: 4.9.0//
//  Candidate: 4.9.0//
//  Version table://
// *** 4.9.0 0//
//        500
http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/ ./ Packages//
//        100 /var/lib/dpkg/status///

/# find /usr/share/ -name "cloud-utils*.jar"//
///usr/share/cloudstack-management/webapps/client/WEB-INF/lib/cloud-utils-4.9.0.jar//
///usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar//
///usr/share/cloudstack-usage/lib/cloud-utils-4.9.0.jar//
//# md5sum
/usr/share/cloudstack-management/webapps/client/WEB-INF/lib/cloud-utils-4.9.0.jar//
//a8de7306d7c80b5a73e93b83afdd119f
/usr/share/cloudstack-management/webapps/client/WEB-INF/lib/cloud-utils-4.9.0.jar//
//# md5sum /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar//
//a8de7306d7c80b5a73e93b83afdd119f
/usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar//
//# md5sum /usr/share/cloudstack-usage/lib/cloud-utils-4.9.0.jar//
//a8de7306d7c80b5a73e93b83afdd119f
/usr/share/cloudstack-usage/lib/cloud-utils-4.9.0.jar/

The classpath.conf was not modified:
/# cat /etc/cloudstack/management/classpath.conf
#!/bin/bash
#...

SYSTEMJARS=""
SCP=$(build-classpath $SYSTEMJARS 2>/dev/null) ; if [ $? != 0 ] ; then
export SCP="" ; fi
MCP=""
DCP="/usr/share/tomcat6/bin/bootstrap.jar:/usr/share/tomcat6/bin/tomcat-juli.jar"
CLASSPATH=$SCP:$DCP:$MCP:/etc/cloudstack/management:/usr/share/cloudstack-management/setup
for jarfile in ""/* ; do
    if [ ! -e "$jarfile" ] ; then continue ; fi
    CLASSPATH=$jarfile:$CLASSPATH
done
for plugin in ""/* ; do
    if [ ! -e "$plugin" ] ; then continue ; fi
    CLASSPATH=$plugin:$CLASSPATH
done
for vendorconf in "/etc/cloudstack/management"/vendor/* ; do
    if [ ! -d "$vendorconf" ] ; then continue ; fi
    CLASSPATH=$vendorconf:$CLASSPATH
done
export CLASSPATH
if ([ -z "$JAVA_HOME" ] || [ ! -d "$JAVA_HOME" ]) && [ -d
/usr/lib/jvm/jre-1.7.0 ]; then
     export JAVA_HOME=/usr/lib/jvm/jre-1.7.0
fi
PATH=$JAVA_HOME/bin:/sbin:/usr/sbin:$PATH
export PATH/

Regards
Martin

On 08/24/2016 06:56 PM, Rohit Yadav wrote:


Martin,


Were you able to fix your issue after installing packages from the
repo Will shared and restarting the services?

I've not personally tested the apt-get.eu repo, but I had earlier
built this repo which I'm personally using in my local KVM-trusty
based cloud: http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/


If you're still getting the error, can you share the JRE version
you're running, both on the mgmt server and on the KVM hosts? You can
run java -version, or share output of "dpkg --get-selections | grep -e
'jdk' -e 'java'". Are you running CloudStack with any additional plugins?

>From the logs, looks like there are mixed jar files,
NioConnectionException class was not found -- something's wrong with
your installation. there must be a cloud-utils jar file make sure your
installation don't have multiple copies/versions of jars
(somewhere) in the in /usr/share/cloudstack-common and in
/usr/share/cloudstack-management/webapps/client/ paths:

Could not find exception:
com.cloud.utils.exception.NioConnectionException in error code list for
exceptions
The error "Unable to initialize the threads." suggests, JVM was not
able to spawn threads. I would like to know your JRE version and any
other settings configured in /etc/cloudstack/management/classpath.conf
(and there are bunch of other files where JAVA_OPTS might have been
overridden). Note: For now you should only be using JRE1.7.


Regards.

rohit.ya...@shapeblue.com<mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com>
www.shapeblue.com<http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com>
@shapeblue




------------------------------------------------------------------------
*From:* martin kolly 
<martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch>
*Sent:* 24 August 2016 19:53:26
*To:* 
dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org>;
 Rohit Yadav
*Subject:* Re: CS 4.9 NIO Selector wait time PR-1601

Thanks Will!

yes the repo is pointing to 4.9 release for all KVMs and for the
management server:
/cloudstack:~# cat /etc/apt/sources.list.d/cloudstack.list //
//deb http://cloudstack.apt-get.eu/ubuntu trusty 4.9/

All KVM agents and the mgmt server are upgraded to release 4.9 based
on the documentation.We have restarted all the cloudstack-agents and
the cloudstack-management service as well.

Network traces are showing packets from KVM <-> Mgmt on port 8250.
there is no security device in between.

thanks
fanfarlo




On 08/24/2016 04:13 PM, Will Stevens wrote:


@rohit, I am guessing they should be installing the cloudstack-agent using
the following repo right?  That is what is described in the upgrade (trusty
instead of precise though).

http://cloudstack.apt-get.eu/ubuntu/dists/trusty/4.9/

@fanfarlo, are your repo's setup to point to the new 4.9 version?

cheers,

will

On Wed, Aug 24, 2016 at 9:46 AM, Rohit Yadav 
<rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com>
wrote:



The PR and fix already exists in 4.9.0 release. Please make sure to
upgrade all of your management server(s) and KVM agents and then also
restart them after the upgrade.


If you are seeing SSL handshake failures, it could be due to network or
security issue and most likely due to mismatch between CloudStack mgmt
server and KVM agent version.


Regards.

rohit.ya...@shapeblue.com<mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com>
www.shapeblue.com<http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com>
@shapeblue



------------------------------
*From:* Will Stevens 
<williamstev...@gmail.com><mailto:williamstev...@gmail.com><mailto:williamstev...@gmail.com><mailto:williamstev...@gmail.com><mailto:williamstev...@gmail.com><mailto:williamstev...@gmail.com><mailto:williamstev...@gmail.com><mailto:williamstev...@gmail.com>
*Sent:* 24 August 2016 18:17:17
*To:* 
dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org>;
 Rohit Yadav
*Subject:* Re: CS 4.9 NIO Selector wait time PR-1601


That PR is already merged, so you don't have to do anything to get that
code, you already have it.

@rohit, can you review this?  I think this is a similar to the issue Simon
reported earlier.

Will

On Aug 24, 2016 6:56 AM, "fanfarlo" 
<fanfar...@gmail.com><mailto:fanfar...@gmail.com><mailto:fanfar...@gmail.com><mailto:fanfar...@gmail.com><mailto:fanfar...@gmail.com><mailto:fanfar...@gmail.com><mailto:fanfar...@gmail.com><mailto:fanfar...@gmail.com>
 wrote:



hi all

We have the following environment:
- OS: Debian 14.04 (hypervisors and management)
- 4 KVM Hosts
- Cloudstack Release 4.9 with local database

Since we upgraded to Release 4.9 the KVM hosts no longer connect to the
management Server. Upgrade procedure was followed as described:
http://docs.cloudstack.apache.org/projects/cloudstack-releas
e-notes/en/4.9.0/upgrade/upgrade-4.8.html


On the KVM hosts we have the following error message:
/2016-08-24 10:42:49,678 INFO  [utils.exception.CSExceptionErrorCode]
(main:null) (logid:) Could not find exception:
com.cloud.utils.exception.NioConnectionException in error code list for
exceptions
2016-08-24 10:42:49,678 WARN  [cloud.agent.Agent] (main:null) (logid:)
NIO Connection Exception
com.cloud.utils.exception.NioConnectionException: SSL Handshake failed
while connecting to host: 10.100.12.10 port: 8250
2016-08-24 10:42:49,678 INFO  [cloud.agent.Agent] (main:null) (logid:)
Attempted to connect to the server, but received an unexpected
exception, trying again...
2016-08-24 10:42:54,679 INFO  [utils.nio.NioClient] (main:null) (logid:)
Connecting to 10.100.12.10:8250
2016-08-24 10:42:54,684 WARN  [utils.nio.Link] (main:null) (logid:) This
SSL engine was forced to close inbound due to end of stream.
2016-08-24 10:42:54,684 ERROR [utils.nio.NioClient] (main:null) (logid:)
SSL Handshake failed while connecting to host: 10.100.12.10 port: 8250
2016-08-24 10:42:54,685 ERROR [utils.nio.NioConnection] (main:null)
(logid:) Unable to initialize the threads.
java.io.IOException: SSL Handshake failed while connecting to host:
10.100.12.10 port: 8250
    at com.cloud.utils.nio.NioClient.init(NioClient.java:67)
    at com.cloud.utils.nio.NioConnection.start(NioConnection.java:88)
    at com.cloud.agent.Agent.start(Agent.java:237)
    at com.cloud.agent.AgentShell.launchAgent(AgentShell.java:399)
    at
com.cloud.agent.AgentShell.launchAgentFromClassInfo(AgentShell.java:367)
    at com.cloud.agent.AgentShell.launchAgent(AgentShell.java:351)
    at com.cloud.agent.AgentShell.start(AgentShell.java:456)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
ssorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
thodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at
org.apache.commons.daemon.support.DaemonLoader.start(DaemonL
oader.java:243)
2016-08-24 10:42:54,685 INFO  [utils.exception.CSExceptionErrorCode]
(main:null) (logid:) Could not find exception:
com.cloud.utils.exception.NioConnectionException in error code list for
exceptions
2016-08-24 10:42:54,685 WARN  [cloud.agent.Agent] (main:null) (logid:)
NIO Connection Exception
com.cloud.utils.exception.NioConnectionException: SSL Handshake failed
while connecting to host: 10.100.12.10 port: 8250
2016-08-24 10:42:54,686 INFO  [cloud.agent.Agent] (main:null) (logid:)
Attempted to connect to the server, but received an unexpected
exception, trying again.../


Port is open on the management server, there is no firewall in between.
We found that there was a bug report here:
https://issues.apache.org/jira/browse/CLOUDSTACK-9348. There is a PR
changing the NIO Selector wait time:
https://github.com/apache/cloudstack/pull/1601 which was merged into
master branch.

Since we installed Release 4.9 we probably need to patch the
NioConection.class as described in PR1601 , right?

kvm03# unzip -v /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar  |
grep NioConnection
    3923  Defl:N     1778  55% 2016-08-02 09:28 05aaf7d5
com/cloud/utils/nio/NioConnection$1.class
     881  Defl:N      495  44% 2016-08-02 09:28 e378984c
com/cloud/utils/nio/NioConnection$ChangeRequest.class
   15410  Defl:N     7130  54% 2016-08-02 09:28 b3281f5a
com/cloud/utils/nio/NioConnection.class
    1134  Defl:N      584  49% 2016-08-02 09:28 8d5cb4a8
com/cloud/utils/exception/NioConnectionException.class

Due to a lack of java expertise we have some basic questions:
- Is there a patched jar file available ? public build server?
- Do we need to create the jar from sources ? procedure?
- How do we apply the patch ?

many thanks!
fanfarlo








rohit.ya...@shapeblue.com<mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com>
www.shapeblue.com<http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com>
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue












rohit.ya...@shapeblue.com<mailto:rohit.ya...@shapeblue.com>
www.shapeblue.com<http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com>
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue







rohit.ya...@shapeblue.com<mailto:rohit.ya...@shapeblue.com>
www.shapeblue.com<http://www.shapeblue.com>
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue







Reply via email to