I spent some time playing with this today, and it seems to be based on Net::SSH's not being thread safe. [1] The short version is that when the gateway host kicks back the 'host unreachable' message, all of the connection threads lock up/die/go away. The exception wanders up the stack and is handled normally, but all of the other threads stop.
I'm dubious about the possibility of creating a patch for this that doesn't do unnatural things to the code. I'm not sure if that means I can use cap or not, for what I'm trying to do, but I'll find another way to make things work if I do. Thanks, Ben [1]: http://weblog.jamisbuck.org/2008/3/18/net-ssh-and-thread-safety On May 21, 3:08 am, Jamis Buck <[EMAIL PROTECTED]> wrote: > It's an exception. If it pains someone enough to write a patch for it, > I'd consider applying it, if it doesn't do unnatural things to the code. > > - Jamis > > On May 20, 2008, at 4:16 PM, David Masover wrote: > > > I'm not sure yet whether that's a pattern or an antipattern. If it's > > a pattern, then maybe we could do something like: > > > HOSTS="-foo" > > > to remove host foo from whatever the normal host list would be? > > > On Tue, May 20, 2008 at 3:03 PM, Jamis Buck <[EMAIL PROTECTED]> > > wrote: > > Honestly, I think I'd recommend just removing the server in question > > from the server list temporarily, running your stuff, and then > > adding it back. I might consider a patch to capistrano to work > > around this, but at the same time, capistrano is already > > ridiculously complex in places. > > > - Jamis > > > On May 20, 2008, at 1:54 PM, Ben Lavender wrote: > > > Ah, oops, err, pardon me for not posting everything I had tried, but > > alas, :on_error does not do the trick here. The current version is: > > > task :add_user, :on_error => :continue do > > prompt(:username) > > #prompt(:new_password) > > begin > > run "useradd #{username}" > > rescue Exception => error > > puts "Caught an error woo woo! It's " + error > > end > > end > > > This still dies: > > /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.1/lib/net/ssh/connection/ > > session.rb:173:in `select': closed stream (IOError) > > from /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.1/lib/net/ssh/ > > connection/ > > session.rb:173:in `process' > > from /Library/Ruby/Gems/1.8/gems/net-ssh-gateway-1.0.0/lib/ > > net/ssh/ > > gateway.rb:189:in `initiate_event_loop!' > > > In addition, catching the Exception processes the SystemExit on its > > way up the stack, albiet not gracefully. It's too late to do any > > good, it seems: > > ./sysadmin.cap.rb:39:in `+': SystemExit#to_str should return String > > (TypeError) > > from ./sysadmin.cap.rb:39:in `load' > > from /Library/Ruby/Gems/1.8/gems/capistrano-2.3.0/lib/ > > capistrano/ > > configuration/execution.rb:80:in `instance_eval' > > > I should also mention I'm using 2.3.0 with capistrano-ext 1.2.0, both > > freshly updated via gem today. > > > I'm new to this, so I'm probably missing something; any ideas? > > > Ben > > > On May 20, 9:38 pm, Jamis Buck <[EMAIL PROTECTED]> wrote: > > Ben, > > > It sounds like you want the :on_error => :continue option for the > > task: > > > task :add_user, :on_error => :continue do > > # ... > > end > > > With that option set, connection errors and runtime errors will be > > dutifully logged, but capistrano will not abort. > > > - Jamis > > > On May 20, 2008, at 5:18 AM, Ben Lavender wrote: > > > Hi all, > > > I'm looking into using Capistrano for system administration as opposed > > to deployment. I'm having some trouble handling errors. > > > As an example, I'm trying to write an add_user task. Easy enough: > > > task :add_user do > > run "useradd #{username}" > > end > > > The problem is in handling error conditions. For example, right now > > I'm trying to add an administrator to a number of machines, but one of > > them is currently offline for maintenance. When I run my task, I get: > > /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/ > > gems/1.8/gems/net-ssh-1.1.2/lib/net/ssh/service/forward/driver.rb: > > 126:in `direct_channel': could not open direct channel for > > 65530:1425-6:22 (2, No route to host) (Net::SSH::Exception) > > > The other machines work fine, and if I use a subset of roles that does > > not include the affected machine, it's all fine. However, I'd like to > > be able to specify that this task continue if one of a subset of > > machines is unavailable (since I can run it again, harmlessly, > > later). Ideally, I'd like to be able to specify the action to be > > taken for a given kind of exception raised for a task. For this one, > > for example, I might send an email to my trouble ticket system that > > useradd failed on a given machine, reminding me to do it later. > > > I dug around in cli/execute, and it seems like error handling is done > > rather statically, by handle_error. Is there an accepted way to do > > this before I start overwriting that method? > > > smime.p7s > > 3KDownload > > > > > > > smime.p7s > 3KDownload --~--~---------~--~----~------------~-------~--~----~ To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/capistrano -~----------~----~----~----~------~----~------~--~---
