David Scott created CLOUDSTACK-6621:
---------------------------------------
Summary: Intermittent failure when management server connects to
hypervisor via ssh
Key: CLOUDSTACK-6621
URL: https://issues.apache.org/jira/browse/CLOUDSTACK-6621
Project: CloudStack
Issue Type: Bug
Security Level: Public (Anyone can view this level - this is the default.)
Components: Management Server
Affects Versions: 4.5.0
Environment: I'm running a management server locally (from master c/s
6511b96088af75b7e37a5f8b0cce609b006021fb) and attempting to add a CentOS 6.4
host via the libvirt/KVM plugin
Reporter: David Scott
The management server attempts to verify the presence of kvm by using ssh to
talk to the host via sshExecuteCmd:
https://github.com/apache/cloudstack/blob/master/utils/src/com/cloud/utils/ssh/SSHCmdHelper.java#L63
The work is done by sshExecuteCmdOneShotWithExitCode (called in a loop)
https://github.com/apache/cloudstack/blob/master/utils/src/com/cloud/utils/ssh/SSHCmdHelper.java#L94
This function waits until either EXIT_STATUS or EOF is set, and then calls
sshSession.getExitStatus. For me this fails with a NullPointerException
{noformat}
ERROR [c.c.u.s.SSHCmdHelper] (581293855@qtp-1130716142-0:ctx-57482224
ctx-b2286596 ctx-e73d2678) Ssh executed failed
java.lang.NullPointerException
{noformat}
I added some extra logging and I believe that EOF can be set *before*
EXIT_STATUS i.e. before the exit status is ready. I think if we want there to
be a readable exit code, we must wait for EXIT_STATUS.
Perhaps my system has unusual timing, but this hits me every time. Note the ssh
command is repeated multiple times (e.g. 3) which could hide the bug for many
people.
I've prepared a simple patch which fixes the issue and makes ssh reliable for
me. I'll upload it to review board shortly.
--
This message was sent by Atlassian JIRA
(v6.2#6252)