Curiosity has killed the metaphorical cat, the cat, in this case,
being my time. Here's a patch, but you may not like it, as it
involves touching Capistrano, Net::SSH, and Net::SSH::Gateway, and
thus probably qualifies as 'unnatural'. As you wrote all three, I'm
tossing the diffs out here. If you'd be willing to merge it, I'll
clean it up some, update the docs and whatnot.
The root problem here is that Ruby's IO.select will toss errors if a
socket throws an error (as they do when the underlying SSH channel is
broken), instead of marking it as being in an error state and
returning. This means that any Net::SSH::Gateway that has multiple
threads will have its process thread die if any one of the connections
is lost or fails to be created. The following diffs solve the problem
by allowing one to pass Net::SSH::Gateway a block to be called upon an
exception being thrown by its session's process() call. Net::SSH
doesn't provide enough information at present to deal with any errors
sent up by IO.select, so it needs a change, too (and a far better one
than the hack here).
diffs against git on 27/5/08:
capistrano/configuration/connections.rb:
25c25
< def initialize(gateway, options)
---
> def initialize(gateway, options, ignore_errors)
29a30
> failure_block = ignore_errors ? lambda { } : nil
31c32
< Net::SSH::Gateway.new(host, user, connect_options)
---
> Net::SSH::Gateway.new(host, user, connect_options,failure_block)
83c84
< GatewayConnectionFactory.new(fetch(:gateway), self)
---
> GatewayConnectionFactory.new(fetch(:gateway), self,
> current_task.continue_on_error?)
100c101,110
< threads.each { |t| t.join }
---
> timeout = exists?(:connection_timeout) ?
> fetch(:connection_timeout) : 10
> threads.each { |t| t.join(timeout) }
>
> servers.each do |server|
> if !(sessions[server]) then
> failed_servers << { :server => server, :error =>
> Net::SSH::Exception.new("Failed to connect to #{server}") }
> logger.debug "Failed to connect to `#{server}' via
> gateway"
> end
> end
>
net/ssh/gateway.rb:
68c68,69
< def initialize(host, user, options={})
---
> def initialize(host, user, options={},block=nil)
> @failure_block ||= block
189c190,200
< @session_mutex.synchronize { @session.process(0.1) }
---
> @session_mutex.synchronize {
> begin
> @session.process(0.1)
> rescue Exception => error
> if @failure_block then
> @failure_block.call(@session,error)
> else
> raise error
> end
> end
> }
net/ssh/service/forward.rb:
79c79
< channel[:socket].close
---
> #channel[:socket].close
forward.rb would need a more generic solution to be correct, but
simply closing the socket on error without closing a socket built over
it will simply cause the Gateway's process() to throw exceptions
endlessly. Forward needs to offer another interface to respond to what
IO.select() throws. That's another solution I'd be willing to hammer
out, if you're willing to merge this stuff.
Let me know,
Ben
On Wed, May 21, 2008 at 5:57 PM, Ben Lavender <[EMAIL PROTECTED]> wrote:
>
> I spent some time playing with this today, and it seems to be based on
> Net::SSH's not being thread safe. [1] The short version is that when
> the gateway host kicks back the 'host unreachable' message, all of the
> connection threads lock up/die/go away. The exception wanders up the
> stack and is handled normally, but all of the other threads stop.
>
> I'm dubious about the possibility of creating a patch for this that
> doesn't do unnatural things to the code. I'm not sure if that means I
> can use cap or not, for what I'm trying to do, but I'll find another
> way to make things work if I do.
>
> Thanks,
> Ben
>
> [1]: http://weblog.jamisbuck.org/2008/3/18/net-ssh-and-thread-safety
>
> On May 21, 3:08 am, Jamis Buck <[EMAIL PROTECTED]> wrote:
>> It's an exception. If it pains someone enough to write a patch for it,
>> I'd consider applying it, if it doesn't do unnatural things to the code.
>>
>> - Jamis
>>
>> On May 20, 2008, at 4:16 PM, David Masover wrote:
>>
>> > I'm not sure yet whether that's a pattern or an antipattern. If it's
>> > a pattern, then maybe we could do something like:
>>
>> > HOSTS="-foo"
>>
>> > to remove host foo from whatever the normal host list would be?
>>
>> > On Tue, May 20, 2008 at 3:03 PM, Jamis Buck <[EMAIL PROTECTED]>
>> > wrote:
>> > Honestly, I think I'd recommend just removing the server in question
>> > from the server list temporarily, running your stuff, and then
>> > adding it back. I might consider a patch to capistrano to work
>> > around this, but at the same time, capistrano is already
>> > ridiculously complex in places.
>>
>> > - Jamis
>>
>> > On May 20, 2008, at 1:54 PM, Ben Lavender wrote:
>>
>> > Ah, oops, err, pardon me for not posting everything I had tried, but
>> > alas, :on_error does not do the trick here. The current version is:
>>
>> > task :add_user, :on_error => :continue do
>> > prompt(:username)
>> > #prompt(:new_password)
>> > begin
>> > run "useradd #{username}"
>> > rescue Exception => error
>> > puts "Caught an error woo woo! It's " + error
>> > end
>> > end
>>
>> > This still dies:
>> > /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.1/lib/net/ssh/connection/
>> > session.rb:173:in `select': closed stream (IOError)
>> > from /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.1/lib/net/ssh/
>> > connection/
>> > session.rb:173:in `process'
>> > from /Library/Ruby/Gems/1.8/gems/net-ssh-gateway-1.0.0/lib/
>> > net/ssh/
>> > gateway.rb:189:in `initiate_event_loop!'
>>
>> > In addition, catching the Exception processes the SystemExit on its
>> > way up the stack, albiet not gracefully. It's too late to do any
>> > good, it seems:
>> > ./sysadmin.cap.rb:39:in `+': SystemExit#to_str should return String
>> > (TypeError)
>> > from ./sysadmin.cap.rb:39:in `load'
>> > from /Library/Ruby/Gems/1.8/gems/capistrano-2.3.0/lib/
>> > capistrano/
>> > configuration/execution.rb:80:in `instance_eval'
>>
>> > I should also mention I'm using 2.3.0 with capistrano-ext 1.2.0, both
>> > freshly updated via gem today.
>>
>> > I'm new to this, so I'm probably missing something; any ideas?
>>
>> > Ben
>>
>> > On May 20, 9:38 pm, Jamis Buck <[EMAIL PROTECTED]> wrote:
>> > Ben,
>>
>> > It sounds like you want the :on_error => :continue option for the
>> > task:
>>
>> > task :add_user, :on_error => :continue do
>> > # ...
>> > end
>>
>> > With that option set, connection errors and runtime errors will be
>> > dutifully logged, but capistrano will not abort.
>>
>> > - Jamis
>>
>> > On May 20, 2008, at 5:18 AM, Ben Lavender wrote:
>>
>> > Hi all,
>>
>> > I'm looking into using Capistrano for system administration as opposed
>> > to deployment. I'm having some trouble handling errors.
>>
>> > As an example, I'm trying to write an add_user task. Easy enough:
>>
>> > task :add_user do
>> > run "useradd #{username}"
>> > end
>>
>> > The problem is in handling error conditions. For example, right now
>> > I'm trying to add an administrator to a number of machines, but one of
>> > them is currently offline for maintenance. When I run my task, I get:
>> > /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/
>> > gems/1.8/gems/net-ssh-1.1.2/lib/net/ssh/service/forward/driver.rb:
>> > 126:in `direct_channel': could not open direct channel for
>> > 65530:1425-6:22 (2, No route to host) (Net::SSH::Exception)
>>
>> > The other machines work fine, and if I use a subset of roles that does
>> > not include the affected machine, it's all fine. However, I'd like to
>> > be able to specify that this task continue if one of a subset of
>> > machines is unavailable (since I can run it again, harmlessly,
>> > later). Ideally, I'd like to be able to specify the action to be
>> > taken for a given kind of exception raised for a task. For this one,
>> > for example, I might send an email to my trouble ticket system that
>> > useradd failed on a given machine, reminding me to do it later.
>>
>> > I dug around in cli/execute, and it seems like error handling is done
>> > rather statically, by handle_error. Is there an accepted way to do
>> > this before I start overwriting that method?
>>
>> > smime.p7s
>> > 3KDownload
>>
>> > >
>>
>>
>> smime.p7s
>> 3KDownload
>
> >
>
--~--~---------~--~----~------------~-------~--~----~
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/capistrano
-~----------~----~----~----~------~----~------~--~---