Thanks for your investigation of this issue, Ben. I'll take a closer
look at what you've got here and see what can be done.

- Jamis

Ben Lavender wrote:
> Apologies for continuing to post here, as this is really a problem
> with Net::SSH::Gateway and Net::SSH, but the thread started here and I
> figured I'd finish.
> 
> I thought the problem might be clearer with a test, so here it is.  It
> fails against unpatched versions downloaded today.  I attempted to
> integrate it into the unit tests of Net::SSH::Gateway, but was unable
> to recreate the behavior with mocha.
> 
> Basically, creating a connection with Net::SSH::Gateway after a
> connection has failed will always fail.  There may be other failure
> conditions, such as existing connections being broken by a failed
> connection.   Are you willing to merge a fix for this, be it something
> along the lines of the previous one or something else entirely?  In
> either case, I suspect the fix will not be a simple one.
> 
> Macintosh:~ ben$ cat gateway-test.rb
> #!/usr/bin/ruby
> require 'rubygems'
> require 'net/ssh'
> require 'net/ssh/gateway'
> 
> GATEWAY="gateway"
> WORKING_HOST="working.host"
> FAILING_HOST="failed.host"
> USER="user"
> 
> # true: program dies from exception on Net::SSH::Gateway process
> thread
> # false: program locks up forever joining with later connections
> Thread.abort_on_exception = true
> 
> # order does not matter,  but there must be both working and failing
> hosts
> hosts = [FAILING_HOST, WORKING_HOST]
> 
> gateway = Net::SSH::Gateway.new(GATEWAY, USER)
> 
> threads = {}
> hosts.each do | host |
>       threads[host] = Thread.new do
>           begin
>                       conn = Net::SSH.start('localhost', USER, :port =>
> gateway.open(host, 22))
>               rescue Exception => error
>                       # this exception is caught for FAILING_HOST, but 
> there's no way to
> manage it
>                       # other threads will still block or crash
>                       puts "Caught error for #{host}: #{error}"
>               end
>       end
> end
> 
>       # force race condition to appear
>       sleep(3)
> 
>       t = Thread.new do
>               begin
>                       conn = Net::SSH.start('localhost', USER, :port =>
> gateway.open(WORKING_HOST, 22))
>               rescue Exception => error
>                       puts "Caught error for #{WORKING_HOST}: #{error}"
>               end
>       end
> 
> 
> begin
>       # blocks forever with Thread.abort_on_exception == false
>       puts "Attempting to join with connection attempted after failed
> connection..."
>       t.join()
> 
>          # is sometimes reached with Thread.abort_on_exception ==
> true, but no use
>       hosts.each do | host |
>               puts "joining for #{host}"
>               threads[host].join
>       end
> # catches SystemExit thrown by Thread.abort_on_exception == true
> # Gateway process thread is now dead and no longer continues, all
> connections block indefinitely when accessed.
> rescue Exception => error
>       puts "caught error: #{error}"
> end
> puts "execution continues, but the gateway no longer processes, all
> connections block"
> 
> Output with Thread.abort_on_exception = false:
> 
> Macintosh:~ ben$ ./gateway-test.rb
> Attempting to join with connection attempted after failed
> connection...
> Caught error for failed_host: end of file reached
> <infinite wait>
> 
> Output with Thread.abort_on_exception = true:
> 
> Macintosh:~ ben$ ./gateway-test.rb
> Attempting to join with connection attempted after failed
> connection...
> Caught error for failed_host: end of file reached
> /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.2/lib/net/ssh/connection/
> session.rb:173:in `select': closed stream (IOError)
>       from /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.2/lib/net/ssh/connection/
> session.rb:173:in `process'
>       from /Library/Ruby/Gems/1.8/gems/net-ssh-gateway-1.0.0/lib/net/ssh/
> gateway.rb:189:in `initiate_event_loop!'
>       from /Library/Ruby/Gems/1.8/gems/net-ssh-gateway-1.0.0/lib/net/ssh/
> gateway.rb:189:in `synchronize'
>       from /Library/Ruby/Gems/1.8/gems/net-ssh-gateway-1.0.0/lib/net/ssh/
> gateway.rb:189:in `initiate_event_loop!'
>       from /Library/Ruby/Gems/1.8/gems/net-ssh-gateway-1.0.0/lib/net/ssh/
> gateway.rb:187:in `initialize'
>       from /Library/Ruby/Gems/1.8/gems/net-ssh-gateway-1.0.0/lib/net/ssh/
> gateway.rb:187:in `new'
>       from /Library/Ruby/Gems/1.8/gems/net-ssh-gateway-1.0.0/lib/net/ssh/
> gateway.rb:187:in `initiate_event_loop!'
>       from /Library/Ruby/Gems/1.8/gems/net-ssh-gateway-1.0.0/lib/net/ssh/
> gateway.rb:74:in `initialize'
>       from ./gateway-test.rb:18:in `new'
>       from ./gateway-test.rb:18
> caught error: #<SystemExit:0x5268f4>
> execution continues, but the gateway no longer processes, all
> connections block
> 
> Thanks,
> Ben
> 
> On May 27, 9:59 pm, "Ben Lavender" <[EMAIL PROTECTED]> wrote:
>> Curiosity has killed the metaphorical cat, the cat, in this case,
>> being my time.  Here's a patch, but you may not like it, as it
>> involves touching Capistrano, Net::SSH, and Net::SSH::Gateway, and
>> thus probably qualifies as 'unnatural'.  As you wrote all three, I'm
>> tossing the diffs out here.  If you'd be willing to merge it, I'll
>> clean it up some, update the docs and whatnot.
>>
>> The root problem here is that Ruby's IO.select will tosserrorsif a
>> socket throws an error (as they do when the underlying SSH channel is
>> broken), instead of marking it as being in an error state and
>> returning.  This means that any Net::SSH::Gateway that has multiple
>> threads will have its process thread die if any one of the connections
>> is lost or fails to be created.  The following diffs solve the problem
>> by allowing one to pass Net::SSH::Gateway a block to be called upon an
>> exception being thrown by its session's process() call.  Net::SSH
>> doesn't provide enough information at present to deal with anyerrors
>> sent up by IO.select, so it needs a change, too (and a far better one
>> than the hack here).
>>
>> diffs against git on 27/5/08:
>>
>> capistrano/configuration/connections.rb:
>> 25c25
>> <         def initialize(gateway, options)
>> --->         def initialize(gateway, options, ignore_errors)
>> 29a30
>>>          failure_block = ignore_errors ? lambda { } : nil
>> 31c32
>> <             Net::SSH::Gateway.new(host, user, connect_options)
>> --->             Net::SSH::Gateway.new(host, user, 
>> connect_options,failure_block)
>>
>> 83c84
>> <             GatewayConnectionFactory.new(fetch(:gateway), self)
>> --->             GatewayConnectionFactory.new(fetch(:gateway), self, 
>> current_task.continue_on_error?)
>>
>> 100c101,110
>> <         threads.each { |t| t.join }
>> ---
>>
>>>            timeout = exists?(:connection_timeout) ? 
>>> fetch(:connection_timeout) : 10
>>>            threads.each { |t| t.join(timeout) }
>>>            servers.each do |server|
>>>              if !(sessions[server]) then
>>>                    failed_servers << { :server => server, :error => 
>>> Net::SSH::Exception.new("Failed to connect to #{server}") }
>>>                    logger.debug "Failed to connect to `#{server}' via 
>>> gateway"
>>>              end
>>>            end
>> net/ssh/gateway.rb:
>> 68c68,69
>> <   def initialize(host, user, options={})
>> --->   def initialize(host, user, options={},block=nil)
>>>    @failure_block ||= block
>> 189c190,200
>> <           @session_mutex.synchronize { @session.process(0.1) }
>> ---
>>
>>>           @session_mutex.synchronize {
>>>               begin
>>>                 @session.process(0.1)
>>>               rescue Exception => error
>>>                 if @failure_block then
>>>                       @failure_block.call(@session,error)
>>>                     else
>>>                       raise error
>>>                     end
>>>               end
>>>              }
>> net/ssh/service/forward.rb:
>> 79c79
>> <           channel[:socket].close
>> ---
>>
>>>           #channel[:socket].close
>> forward.rb would need a more generic solution to be correct, but
>> simply closing the socket on error without closing a socket built over
>> it will simply cause the Gateway's process() to throw exceptions
>> endlessly. Forward needs to offer another interface to respond to what
>> IO.select() throws.  That's another solution I'd be willing to hammer
>> out, if you're willing to merge this stuff.
>>
>> Let me know,
>> Ben
>>
>> On Wed, May 21, 2008 at 5:57 PM, Ben Lavender <[EMAIL PROTECTED]> wrote:
>>
>>> I spent some time playing with this today, and it seems to be based on
>>> Net::SSH's not being thread safe. [1]  The short version is that when
>>> the gateway host kicks back the 'host unreachable' message, all of the
>>> connection threads lock up/die/go away.  The exception wanders up the
>>> stack and is handled normally, but all of the other threads stop.
>>> I'm dubious about the possibility of creating a patch for this that
>>> doesn't do unnatural things to the code.  I'm not sure if that means I
>>> can use cap or not, for what I'm trying to do, but I'll find another
>>> way to make things work if I do.
>>> Thanks,
>>> Ben
>>> [1]:http://weblog.jamisbuck.org/2008/3/18/net-ssh-and-thread-safety
>>> On May 21, 3:08 am, Jamis Buck <[EMAIL PROTECTED]> wrote:
>>>> It's an exception. If it pains someone enough to write a patch for it,
>>>> I'd consider applying it, if it doesn't do unnatural things to the code.
>>>> - Jamis
>>>> On May 20, 2008, at 4:16 PM, David Masover wrote:
>>>>> I'm not sure yet whether that's a pattern or an antipattern. If it's
>>>>> a pattern, then maybe we could do something like:
>>>>> HOSTS="-foo"
>>>>> to remove host foo from whatever the normal host list would be?
>>>>> On Tue, May 20, 2008 at 3:03 PM, Jamis Buck <[EMAIL PROTECTED]>
>>>>> wrote:
>>>>> Honestly, I think I'd recommend just removing the server in question
>>>>> from the server list temporarily, running your stuff, and then
>>>>> adding it back. I might consider a patch to capistrano to work
>>>>> around this, but at the same time, capistrano is already
>>>>> ridiculously complex in places.
>>>>> - Jamis
>>>>> On May 20, 2008, at 1:54 PM, Ben Lavender wrote:
>>>>> Ah, oops, err, pardon me for not posting everything I had tried, but
>>>>> alas, :on_error does not do the trick here.  The current version is:
>>>>> task :add_user, :on_error => :continue do
>>>>>   prompt(:username)
>>>>>   #prompt(:new_password)
>>>>>   begin
>>>>>       run "useradd #{username}"
>>>>>   rescue Exception => error
>>>>>       puts "Caught an error woo woo! It's " + error
>>>>>   end
>>>>> end
>>>>> This still dies:
>>>>> /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.1/lib/net/ssh/connection/
>>>>> session.rb:173:in `select': closed stream (IOError)
>>>>>        from /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.1/lib/net/ssh/
>>>>> connection/
>>>>> session.rb:173:in `process'
>>>>>        from /Library/Ruby/Gems/1.8/gems/net-ssh-gateway-1.0.0/lib/
>>>>> net/ssh/
>>>>> gateway.rb:189:in `initiate_event_loop!'
>>>>> In addition, catching the Exception processes the SystemExit on its
>>>>> way up the stack, albiet not gracefully.  It's too late to do any
>>>>> good, it seems:
>>>>> ./sysadmin.cap.rb:39:in `+': SystemExit#to_str should return String
>>>>> (TypeError)
>>>>>        from ./sysadmin.cap.rb:39:in `load'
>>>>>        from /Library/Ruby/Gems/1.8/gems/capistrano-2.3.0/lib/
>>>>> capistrano/
>>>>> configuration/execution.rb:80:in `instance_eval'
>>>>> I should also mention I'm using 2.3.0 with capistrano-ext 1.2.0, both
>>>>> freshly updated via gem today.
>>>>> I'm new to this, so I'm probably missing something; any ideas?
>>>>> Ben
>>>>> On May 20, 9:38 pm, Jamis Buck <[EMAIL PROTECTED]> wrote:
>>>>> Ben,
>>>>> It sounds like you want the :on_error => :continue option for the
>>>>> task:
>>>>>  task :add_user, :on_error => :continue do
>>>>>    # ...
>>>>>  end
>>>>> With that option set, connectionerrorsand runtimeerrorswill be
>>>>> dutifully logged, but capistrano will not abort.
>>>>> - Jamis
>>>>> On May 20, 2008, at 5:18 AM, Ben Lavender wrote:
>>>>> Hi all,
>>>>> I'm looking into using Capistrano for system administration as opposed
>>>>> to deployment.  I'm having some troublehandlingerrors.
>>>>> As an example, I'm trying to write an add_user task.  Easy enough:
>>>>> task :add_user do
>>>>>  run "useradd #{username}"
>>>>> end
>>>>> The problem is inhandlingerror conditions.  For example, right now
>>>>> I'm trying to add an administrator to a number of machines, but one of
>>>>> them is currently offline for maintenance.  When I run my task, I get:
>>>>> /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/
>>>>> gems/1.8/gems/net-ssh-1.1.2/lib/net/ssh/service/forward/driver.rb:
>>>>> 126:in `direct_channel': could not open direct channel for
>>>>> 65530:1425-6:22 (2, No route to host) (Net::SSH::Exception)
>>>>> The other machines work fine, and if I use a subset of roles that does
>>>>> not include the affected machine, it's all fine.  However, I'd like to
>>>>> be able to specify that this task continue if one of a subset of
>>>>> machines is unavailable (since I can run it again, harmlessly,
>>>>> later).  Ideally, I'd like to be able to specify the action to be
>>>>> taken for a given kind of exception raised for a task.  For this one,
>>>>> for example, I might send an email to my trouble ticket system that
>>>>> useradd failed on a given machine, reminding me to do it later.
>>>>> I dug around in cli/execute, and it seems like errorhandlingis done
>>>>> rather statically, by handle_error.  Is there an accepted way to do
>>>>> this before I start overwriting that method?
>>>>> smime.p7s
>>>>> 3KDownload
>>>>  smime.p7s
>>>> 3KDownload
> > 


--~--~---------~--~----~------------~-------~--~----~
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/capistrano
-~----------~----~----~----~------~----~------~--~---

Reply via email to