Thanks for your investigation of this issue, Ben. I'll take a closer
look at what you've got here and see what can be done.
- Jamis
Ben Lavender wrote:
> Apologies for continuing to post here, as this is really a problem
> with Net::SSH::Gateway and Net::SSH, but the thread started here and I
> figured I'd finish.
>
> I thought the problem might be clearer with a test, so here it is. It
> fails against unpatched versions downloaded today. I attempted to
> integrate it into the unit tests of Net::SSH::Gateway, but was unable
> to recreate the behavior with mocha.
>
> Basically, creating a connection with Net::SSH::Gateway after a
> connection has failed will always fail. There may be other failure
> conditions, such as existing connections being broken by a failed
> connection. Are you willing to merge a fix for this, be it something
> along the lines of the previous one or something else entirely? In
> either case, I suspect the fix will not be a simple one.
>
> Macintosh:~ ben$ cat gateway-test.rb
> #!/usr/bin/ruby
> require 'rubygems'
> require 'net/ssh'
> require 'net/ssh/gateway'
>
> GATEWAY="gateway"
> WORKING_HOST="working.host"
> FAILING_HOST="failed.host"
> USER="user"
>
> # true: program dies from exception on Net::SSH::Gateway process
> thread
> # false: program locks up forever joining with later connections
> Thread.abort_on_exception = true
>
> # order does not matter, but there must be both working and failing
> hosts
> hosts = [FAILING_HOST, WORKING_HOST]
>
> gateway = Net::SSH::Gateway.new(GATEWAY, USER)
>
> threads = {}
> hosts.each do | host |
> threads[host] = Thread.new do
> begin
> conn = Net::SSH.start('localhost', USER, :port =>
> gateway.open(host, 22))
> rescue Exception => error
> # this exception is caught for FAILING_HOST, but
> there's no way to
> manage it
> # other threads will still block or crash
> puts "Caught error for #{host}: #{error}"
> end
> end
> end
>
> # force race condition to appear
> sleep(3)
>
> t = Thread.new do
> begin
> conn = Net::SSH.start('localhost', USER, :port =>
> gateway.open(WORKING_HOST, 22))
> rescue Exception => error
> puts "Caught error for #{WORKING_HOST}: #{error}"
> end
> end
>
>
> begin
> # blocks forever with Thread.abort_on_exception == false
> puts "Attempting to join with connection attempted after failed
> connection..."
> t.join()
>
> # is sometimes reached with Thread.abort_on_exception ==
> true, but no use
> hosts.each do | host |
> puts "joining for #{host}"
> threads[host].join
> end
> # catches SystemExit thrown by Thread.abort_on_exception == true
> # Gateway process thread is now dead and no longer continues, all
> connections block indefinitely when accessed.
> rescue Exception => error
> puts "caught error: #{error}"
> end
> puts "execution continues, but the gateway no longer processes, all
> connections block"
>
> Output with Thread.abort_on_exception = false:
>
> Macintosh:~ ben$ ./gateway-test.rb
> Attempting to join with connection attempted after failed
> connection...
> Caught error for failed_host: end of file reached
> <infinite wait>
>
> Output with Thread.abort_on_exception = true:
>
> Macintosh:~ ben$ ./gateway-test.rb
> Attempting to join with connection attempted after failed
> connection...
> Caught error for failed_host: end of file reached
> /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.2/lib/net/ssh/connection/
> session.rb:173:in `select': closed stream (IOError)
> from /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.2/lib/net/ssh/connection/
> session.rb:173:in `process'
> from /Library/Ruby/Gems/1.8/gems/net-ssh-gateway-1.0.0/lib/net/ssh/
> gateway.rb:189:in `initiate_event_loop!'
> from /Library/Ruby/Gems/1.8/gems/net-ssh-gateway-1.0.0/lib/net/ssh/
> gateway.rb:189:in `synchronize'
> from /Library/Ruby/Gems/1.8/gems/net-ssh-gateway-1.0.0/lib/net/ssh/
> gateway.rb:189:in `initiate_event_loop!'
> from /Library/Ruby/Gems/1.8/gems/net-ssh-gateway-1.0.0/lib/net/ssh/
> gateway.rb:187:in `initialize'
> from /Library/Ruby/Gems/1.8/gems/net-ssh-gateway-1.0.0/lib/net/ssh/
> gateway.rb:187:in `new'
> from /Library/Ruby/Gems/1.8/gems/net-ssh-gateway-1.0.0/lib/net/ssh/
> gateway.rb:187:in `initiate_event_loop!'
> from /Library/Ruby/Gems/1.8/gems/net-ssh-gateway-1.0.0/lib/net/ssh/
> gateway.rb:74:in `initialize'
> from ./gateway-test.rb:18:in `new'
> from ./gateway-test.rb:18
> caught error: #<SystemExit:0x5268f4>
> execution continues, but the gateway no longer processes, all
> connections block
>
> Thanks,
> Ben
>
> On May 27, 9:59 pm, "Ben Lavender" <[EMAIL PROTECTED]> wrote:
>> Curiosity has killed the metaphorical cat, the cat, in this case,
>> being my time. Here's a patch, but you may not like it, as it
>> involves touching Capistrano, Net::SSH, and Net::SSH::Gateway, and
>> thus probably qualifies as 'unnatural'. As you wrote all three, I'm
>> tossing the diffs out here. If you'd be willing to merge it, I'll
>> clean it up some, update the docs and whatnot.
>>
>> The root problem here is that Ruby's IO.select will tosserrorsif a
>> socket throws an error (as they do when the underlying SSH channel is
>> broken), instead of marking it as being in an error state and
>> returning. This means that any Net::SSH::Gateway that has multiple
>> threads will have its process thread die if any one of the connections
>> is lost or fails to be created. The following diffs solve the problem
>> by allowing one to pass Net::SSH::Gateway a block to be called upon an
>> exception being thrown by its session's process() call. Net::SSH
>> doesn't provide enough information at present to deal with anyerrors
>> sent up by IO.select, so it needs a change, too (and a far better one
>> than the hack here).
>>
>> diffs against git on 27/5/08:
>>
>> capistrano/configuration/connections.rb:
>> 25c25
>> < def initialize(gateway, options)
>> ---> def initialize(gateway, options, ignore_errors)
>> 29a30
>>> failure_block = ignore_errors ? lambda { } : nil
>> 31c32
>> < Net::SSH::Gateway.new(host, user, connect_options)
>> ---> Net::SSH::Gateway.new(host, user,
>> connect_options,failure_block)
>>
>> 83c84
>> < GatewayConnectionFactory.new(fetch(:gateway), self)
>> ---> GatewayConnectionFactory.new(fetch(:gateway), self,
>> current_task.continue_on_error?)
>>
>> 100c101,110
>> < threads.each { |t| t.join }
>> ---
>>
>>> timeout = exists?(:connection_timeout) ?
>>> fetch(:connection_timeout) : 10
>>> threads.each { |t| t.join(timeout) }
>>> servers.each do |server|
>>> if !(sessions[server]) then
>>> failed_servers << { :server => server, :error =>
>>> Net::SSH::Exception.new("Failed to connect to #{server}") }
>>> logger.debug "Failed to connect to `#{server}' via
>>> gateway"
>>> end
>>> end
>> net/ssh/gateway.rb:
>> 68c68,69
>> < def initialize(host, user, options={})
>> ---> def initialize(host, user, options={},block=nil)
>>> @failure_block ||= block
>> 189c190,200
>> < @session_mutex.synchronize { @session.process(0.1) }
>> ---
>>
>>> @session_mutex.synchronize {
>>> begin
>>> @session.process(0.1)
>>> rescue Exception => error
>>> if @failure_block then
>>> @failure_block.call(@session,error)
>>> else
>>> raise error
>>> end
>>> end
>>> }
>> net/ssh/service/forward.rb:
>> 79c79
>> < channel[:socket].close
>> ---
>>
>>> #channel[:socket].close
>> forward.rb would need a more generic solution to be correct, but
>> simply closing the socket on error without closing a socket built over
>> it will simply cause the Gateway's process() to throw exceptions
>> endlessly. Forward needs to offer another interface to respond to what
>> IO.select() throws. That's another solution I'd be willing to hammer
>> out, if you're willing to merge this stuff.
>>
>> Let me know,
>> Ben
>>
>> On Wed, May 21, 2008 at 5:57 PM, Ben Lavender <[EMAIL PROTECTED]> wrote:
>>
>>> I spent some time playing with this today, and it seems to be based on
>>> Net::SSH's not being thread safe. [1] The short version is that when
>>> the gateway host kicks back the 'host unreachable' message, all of the
>>> connection threads lock up/die/go away. The exception wanders up the
>>> stack and is handled normally, but all of the other threads stop.
>>> I'm dubious about the possibility of creating a patch for this that
>>> doesn't do unnatural things to the code. I'm not sure if that means I
>>> can use cap or not, for what I'm trying to do, but I'll find another
>>> way to make things work if I do.
>>> Thanks,
>>> Ben
>>> [1]:http://weblog.jamisbuck.org/2008/3/18/net-ssh-and-thread-safety
>>> On May 21, 3:08 am, Jamis Buck <[EMAIL PROTECTED]> wrote:
>>>> It's an exception. If it pains someone enough to write a patch for it,
>>>> I'd consider applying it, if it doesn't do unnatural things to the code.
>>>> - Jamis
>>>> On May 20, 2008, at 4:16 PM, David Masover wrote:
>>>>> I'm not sure yet whether that's a pattern or an antipattern. If it's
>>>>> a pattern, then maybe we could do something like:
>>>>> HOSTS="-foo"
>>>>> to remove host foo from whatever the normal host list would be?
>>>>> On Tue, May 20, 2008 at 3:03 PM, Jamis Buck <[EMAIL PROTECTED]>
>>>>> wrote:
>>>>> Honestly, I think I'd recommend just removing the server in question
>>>>> from the server list temporarily, running your stuff, and then
>>>>> adding it back. I might consider a patch to capistrano to work
>>>>> around this, but at the same time, capistrano is already
>>>>> ridiculously complex in places.
>>>>> - Jamis
>>>>> On May 20, 2008, at 1:54 PM, Ben Lavender wrote:
>>>>> Ah, oops, err, pardon me for not posting everything I had tried, but
>>>>> alas, :on_error does not do the trick here. The current version is:
>>>>> task :add_user, :on_error => :continue do
>>>>> prompt(:username)
>>>>> #prompt(:new_password)
>>>>> begin
>>>>> run "useradd #{username}"
>>>>> rescue Exception => error
>>>>> puts "Caught an error woo woo! It's " + error
>>>>> end
>>>>> end
>>>>> This still dies:
>>>>> /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.1/lib/net/ssh/connection/
>>>>> session.rb:173:in `select': closed stream (IOError)
>>>>> from /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.1/lib/net/ssh/
>>>>> connection/
>>>>> session.rb:173:in `process'
>>>>> from /Library/Ruby/Gems/1.8/gems/net-ssh-gateway-1.0.0/lib/
>>>>> net/ssh/
>>>>> gateway.rb:189:in `initiate_event_loop!'
>>>>> In addition, catching the Exception processes the SystemExit on its
>>>>> way up the stack, albiet not gracefully. It's too late to do any
>>>>> good, it seems:
>>>>> ./sysadmin.cap.rb:39:in `+': SystemExit#to_str should return String
>>>>> (TypeError)
>>>>> from ./sysadmin.cap.rb:39:in `load'
>>>>> from /Library/Ruby/Gems/1.8/gems/capistrano-2.3.0/lib/
>>>>> capistrano/
>>>>> configuration/execution.rb:80:in `instance_eval'
>>>>> I should also mention I'm using 2.3.0 with capistrano-ext 1.2.0, both
>>>>> freshly updated via gem today.
>>>>> I'm new to this, so I'm probably missing something; any ideas?
>>>>> Ben
>>>>> On May 20, 9:38 pm, Jamis Buck <[EMAIL PROTECTED]> wrote:
>>>>> Ben,
>>>>> It sounds like you want the :on_error => :continue option for the
>>>>> task:
>>>>> task :add_user, :on_error => :continue do
>>>>> # ...
>>>>> end
>>>>> With that option set, connectionerrorsand runtimeerrorswill be
>>>>> dutifully logged, but capistrano will not abort.
>>>>> - Jamis
>>>>> On May 20, 2008, at 5:18 AM, Ben Lavender wrote:
>>>>> Hi all,
>>>>> I'm looking into using Capistrano for system administration as opposed
>>>>> to deployment. I'm having some troublehandlingerrors.
>>>>> As an example, I'm trying to write an add_user task. Easy enough:
>>>>> task :add_user do
>>>>> run "useradd #{username}"
>>>>> end
>>>>> The problem is inhandlingerror conditions. For example, right now
>>>>> I'm trying to add an administrator to a number of machines, but one of
>>>>> them is currently offline for maintenance. When I run my task, I get:
>>>>> /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/
>>>>> gems/1.8/gems/net-ssh-1.1.2/lib/net/ssh/service/forward/driver.rb:
>>>>> 126:in `direct_channel': could not open direct channel for
>>>>> 65530:1425-6:22 (2, No route to host) (Net::SSH::Exception)
>>>>> The other machines work fine, and if I use a subset of roles that does
>>>>> not include the affected machine, it's all fine. However, I'd like to
>>>>> be able to specify that this task continue if one of a subset of
>>>>> machines is unavailable (since I can run it again, harmlessly,
>>>>> later). Ideally, I'd like to be able to specify the action to be
>>>>> taken for a given kind of exception raised for a task. For this one,
>>>>> for example, I might send an email to my trouble ticket system that
>>>>> useradd failed on a given machine, reminding me to do it later.
>>>>> I dug around in cli/execute, and it seems like errorhandlingis done
>>>>> rather statically, by handle_error. Is there an accepted way to do
>>>>> this before I start overwriting that method?
>>>>> smime.p7s
>>>>> 3KDownload
>>>> smime.p7s
>>>> 3KDownload
> >
--~--~---------~--~----~------------~-------~--~----~
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/capistrano
-~----------~----~----~----~------~----~------~--~---