Hello Captains, We are trying to spin up new EC2 boxes and, when they are awake, log in and run some commands. We're using Capistrano to do this.
What we do is we attempt a simple remote command of some sort using the 'sudo' method. If that fails (for instance if the SSH daemon is not accepting connections yet), we catch the exception, sleep for a bit, and try again. When it succeeds, we know the server is awake and that we can log in and try running our bootstrapping scripts. Before the SSH daemon wakes up, we get Errno::ECONNREFUSED exceptions as you would expect, and so we happily retry a few seconds later. At some point, however, the attempt to connect hangs. At this point the ruby process pegs the CPU. If I then go into another terminal window and run the cap task that performs the simple remote command, it works. Meanwhile, the first task is still hung. If I interrupt it, I get the following at the top of the stack trace: http://gist.github.com/637388 Some googling found a bug that appears quite similar (see the linked lighthouse ticket and blog article below), but we're seeing this with the latest/greatest net-ssh, ruby, and capistrano. Also suspicious is that this seems to happen only at a particular time in the server's boot cycle - about 10 seconds from when the EC2 instance became "running", the hang happens. So it sounds a bit like a client/server interaction of some sort that happens only during a particular moment. Here's the lighthouse ticket for capistrano, marked resolved: https://capistrano.lighthouseapp.com/projects/8716/tickets/79-capistrano-hangs-on-shell-command-for-many-computers-on-ruby-186-p368 And here's the lighthouse ticket for net-ssh (issue #1!), also resolved: http://net-ssh.lighthouseapp.com/projects/36253/tickets/1 We've tried this with the following rubies: ruby-1.8.7-p174 ruby-1.8.7-p302 ruby-1.9.1-p378 jruby-1.5.1 We are using capistrano 2.5.19 and net-ssh 2.0.23. Here's the relevant part of the deploy.rb: http://gist.github.com/637389 We're kind of at the end of our rope here; our only workaround is to try some crazy monkey-patching of capistrano along with some thread killing and retrying. Or, finding another way to know for sure that the server is up. Ideas? Thanks! Mike -- * You received this message because you are subscribed to the Google Groups "Capistrano" group. * To post to this group, send email to [email protected] * To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/capistrano?hl=en
