[capistrano] Hang while trying to execute commands on a (possibly still booting) remote server

Mike Grafton Thu, 21 Oct 2010 01:01:18 -0700

Hello Captains,

We are trying to spin up new EC2 boxes and, when they are awake, log
in and run some commands. We're using Capistrano to do this.

What we do is we attempt a simple remote command of some sort using
the 'sudo' method. If that fails (for instance if the SSH daemon is
not accepting connections yet), we catch the exception, sleep for a
bit, and try again. When it succeeds, we know the server is awake and
that we can log in and try running our bootstrapping scripts.

Before the SSH daemon wakes up, we get Errno::ECONNREFUSED exceptions
as you would expect, and so we happily retry a few seconds later. At
some point, however, the attempt to connect hangs. At this point the
ruby process pegs the CPU. If I then go into another terminal window
and run the cap task that performs the simple remote command, it
works. Meanwhile, the first task is still hung. If I interrupt it, I
get the following at the top of the stack trace:

http://gist.github.com/637388

Some googling found a bug that appears quite similar (see the linked
lighthouse ticket and blog article below), but we're seeing this with
the latest/greatest net-ssh, ruby, and capistrano. Also suspicious is
that this seems to happen only at a particular time in the server's
boot cycle - about 10 seconds from when the EC2 instance became
"running", the hang happens. So it sounds a bit like a client/server
interaction of some sort that happens only during a particular moment.

Here's the lighthouse ticket for capistrano, marked resolved:

https://capistrano.lighthouseapp.com/projects/8716/tickets/79-capistrano-hangs-on-shell-command-for-many-computers-on-ruby-186-p368

And here's the lighthouse ticket for net-ssh (issue #1!), also
resolved:

http://net-ssh.lighthouseapp.com/projects/36253/tickets/1

We've tried this with the following rubies:

ruby-1.8.7-p174
ruby-1.8.7-p302
ruby-1.9.1-p378
jruby-1.5.1

We are using capistrano 2.5.19 and net-ssh 2.0.23. Here's the relevant
part of the deploy.rb:

http://gist.github.com/637389

We're kind of at the end of our rope here; our only workaround is to
try some crazy monkey-patching of capistrano along with some thread
killing and retrying. Or, finding another way to know for sure that
the server is up.

Ideas?

Thanks!
Mike

--
* You received this message because you are subscribed to the Google Groups
"Capistrano" group.
* To post to this group, send email to [email protected]
* To unsubscribe from this group, send email to
[email protected] For more options, visit this group at
http://groups.google.com/group/capistrano?hl=en

[capistrano] Hang while trying to execute commands on a (possibly still booting) remote server

Reply via email to