GitHub user oddshocks opened a pull request:
https://github.com/apache/libcloud/pull/331
Add a delay to SSH connection to fix the deploy_node race condition
This patch might look a little "hackish", but it has solved the terrible
`deploy_node` race condition for me 100%. I've been using libcloud with this
patch for a few days with a 100% success rate. It seems that the `timeout`
argument for `_ssh_client_connect` is insufficient. In fact, it's set to 300
seconds by default, but the entire operation doesn't take nearly that long to
fail, so that timeout must not be the proper thing to fix the `deploy_node`
race condition. *This* fix, however, resolves the issue. 60 seconds is more
than enough time to get the SSH key installed onto the node, even with the
recent addition of `ssh_alternate_usernames`, which we suspect to be the
culprit of this new race condition.
Please, let me know if there's anything I can do to improve upon this patch
and get it merged in. This is really a critical bug that needs to be resolved
quickly.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/oddshocks/libcloud fix-deploy-race-condition
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/libcloud/pull/331.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #331
----
commit 05c846285c40bcc8e52e25a247d07315092c1f1d
Author: David Gay <[email protected]>
Date: 2014-06-28T18:41:17Z
Add a delay to SSH connection to fix the deploy_node race condition
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---