[Capistrano] Re: Handling errors

Ben Lavender Sat, 07 Jun 2008 09:05:07 -0700

Apologies for continuing to post here, as this is really a problem
with Net::SSH::Gateway and Net::SSH, but the thread started here and I
figured I'd finish.


I thought the problem might be clearer with a test, so here it is.  It
fails against unpatched versions downloaded today.  I attempted to
integrate it into the unit tests of Net::SSH::Gateway, but was unable
to recreate the behavior with mocha.

Basically, creating a connection with Net::SSH::Gateway after a
connection has failed will always fail.  There may be other failure
conditions, such as existing connections being broken by a failed
connection.   Are you willing to merge a fix for this, be it something
along the lines of the previous one or something else entirely?  In
either case, I suspect the fix will not be a simple one.

Macintosh:~ ben$ cat gateway-test.rb
#!/usr/bin/ruby
require 'rubygems'
require 'net/ssh'
require 'net/ssh/gateway'

GATEWAY="gateway"
WORKING_HOST="working.host"
FAILING_HOST="failed.host"
USER="user"

# true: program dies from exception on Net::SSH::Gateway process
thread
# false: program locks up forever joining with later connections
Thread.abort_on_exception = true

# order does not matter,  but there must be both working and failing
hosts
hosts = [FAILING_HOST, WORKING_HOST]

gateway = Net::SSH::Gateway.new(GATEWAY, USER)

threads = {}
hosts.each do | host |
        threads[host] = Thread.new do
            begin
                        conn = Net::SSH.start('localhost', USER, :port =>
gateway.open(host, 22))
                rescue Exception => error
                        # this exception is caught for FAILING_HOST, but 
there's no way to
manage it
                        # other threads will still block or crash
                        puts "Caught error for #{host}: #{error}"
                end
        end
end

        # force race condition to appear
        sleep(3)

        t = Thread.new do
                begin
                        conn = Net::SSH.start('localhost', USER, :port =>
gateway.open(WORKING_HOST, 22))
                rescue Exception => error
                        puts "Caught error for #{WORKING_HOST}: #{error}"
                end
        end


begin
        # blocks forever with Thread.abort_on_exception == false
        puts "Attempting to join with connection attempted after failed
connection..."
        t.join()

         # is sometimes reached with Thread.abort_on_exception ==
true, but no use
        hosts.each do | host |
                puts "joining for #{host}"
                threads[host].join
        end
# catches SystemExit thrown by Thread.abort_on_exception == true
# Gateway process thread is now dead and no longer continues, all
connections block indefinitely when accessed.
rescue Exception => error
        puts "caught error: #{error}"
end
puts "execution continues, but the gateway no longer processes, all
connections block"

Output with Thread.abort_on_exception = false:

Macintosh:~ ben$ ./gateway-test.rb
Attempting to join with connection attempted after failed
connection...
Caught error for failed_host: end of file reached
<infinite wait>

Output with Thread.abort_on_exception = true:

Macintosh:~ ben$ ./gateway-test.rb
Attempting to join with connection attempted after failed
connection...
Caught error for failed_host: end of file reached
/Library/Ruby/Gems/1.8/gems/net-ssh-2.0.2/lib/net/ssh/connection/
session.rb:173:in `select': closed stream (IOError)
        from /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.2/lib/net/ssh/connection/
session.rb:173:in `process'
        from /Library/Ruby/Gems/1.8/gems/net-ssh-gateway-1.0.0/lib/net/ssh/
gateway.rb:189:in `initiate_event_loop!'
        from /Library/Ruby/Gems/1.8/gems/net-ssh-gateway-1.0.0/lib/net/ssh/
gateway.rb:189:in `synchronize'
        from /Library/Ruby/Gems/1.8/gems/net-ssh-gateway-1.0.0/lib/net/ssh/
gateway.rb:189:in `initiate_event_loop!'
        from /Library/Ruby/Gems/1.8/gems/net-ssh-gateway-1.0.0/lib/net/ssh/
gateway.rb:187:in `initialize'
        from /Library/Ruby/Gems/1.8/gems/net-ssh-gateway-1.0.0/lib/net/ssh/
gateway.rb:187:in `new'
        from /Library/Ruby/Gems/1.8/gems/net-ssh-gateway-1.0.0/lib/net/ssh/
gateway.rb:187:in `initiate_event_loop!'
        from /Library/Ruby/Gems/1.8/gems/net-ssh-gateway-1.0.0/lib/net/ssh/
gateway.rb:74:in `initialize'
        from ./gateway-test.rb:18:in `new'
        from ./gateway-test.rb:18
caught error: #<SystemExit:0x5268f4>
execution continues, but the gateway no longer processes, all
connections block

Thanks,
Ben

On May 27, 9:59 pm, "Ben Lavender" <[EMAIL PROTECTED]> wrote:
> Curiosity has killed the metaphorical cat, the cat, in this case,
> being my time.  Here's a patch, but you may not like it, as it
> involves touching Capistrano, Net::SSH, and Net::SSH::Gateway, and
> thus probably qualifies as 'unnatural'.  As you wrote all three, I'm
> tossing the diffs out here.  If you'd be willing to merge it, I'll
> clean it up some, update the docs and whatnot.
>
> The root problem here is that Ruby's IO.select will tosserrorsif a
> socket throws an error (as they do when the underlying SSH channel is
> broken), instead of marking it as being in an error state and
> returning.  This means that any Net::SSH::Gateway that has multiple
> threads will have its process thread die if any one of the connections
> is lost or fails to be created.  The following diffs solve the problem
> by allowing one to pass Net::SSH::Gateway a block to be called upon an
> exception being thrown by its session's process() call.  Net::SSH
> doesn't provide enough information at present to deal with anyerrors
> sent up by IO.select, so it needs a change, too (and a far better one
> than the hack here).
>
> diffs against git on 27/5/08:
>
> capistrano/configuration/connections.rb:
> 25c25
> <         def initialize(gateway, options)
> --->         def initialize(gateway, options, ignore_errors)
> 29a30
> >          failure_block = ignore_errors ? lambda { } : nil
>
> 31c32
> <             Net::SSH::Gateway.new(host, user, connect_options)
> --->             Net::SSH::Gateway.new(host, user, 
> connect_options,failure_block)
>
> 83c84
> <             GatewayConnectionFactory.new(fetch(:gateway), self)
> --->             GatewayConnectionFactory.new(fetch(:gateway), self, 
> current_task.continue_on_error?)
>
> 100c101,110
> <         threads.each { |t| t.join }
> ---
>
> >            timeout = exists?(:connection_timeout) ? 
> > fetch(:connection_timeout) : 10
> >            threads.each { |t| t.join(timeout) }
>
> >            servers.each do |server|
> >              if !(sessions[server]) then
> >                    failed_servers << { :server => server, :error => 
> > Net::SSH::Exception.new("Failed to connect to #{server}") }
> >                    logger.debug "Failed to connect to `#{server}' via 
> > gateway"
> >              end
> >            end
>
> net/ssh/gateway.rb:
> 68c68,69
> <   def initialize(host, user, options={})
> --->   def initialize(host, user, options={},block=nil)
> >   [EMAIL PROTECTED] ||= block
>
> 189c190,200
> <           @session_mutex.synchronize { @session.process(0.1) }
> ---
>
> >           @session_mutex.synchronize {
> >               begin
> >                 @session.process(0.1)
> >               rescue Exception => error
> >                 if @failure_block then
> >                       @failure_block.call(@session,error)
> >                     else
> >                       raise error
> >                     end
> >               end
> >              }
>
> net/ssh/service/forward.rb:
> 79c79
> <           channel[:socket].close
> ---
>
> >           #channel[:socket].close
>
> forward.rb would need a more generic solution to be correct, but
> simply closing the socket on error without closing a socket built over
> it will simply cause the Gateway's process() to throw exceptions
> endlessly. Forward needs to offer another interface to respond to what
> IO.select() throws.  That's another solution I'd be willing to hammer
> out, if you're willing to merge this stuff.
>
> Let me know,
> Ben
>
> On Wed, May 21, 2008 at 5:57 PM, Ben Lavender <[EMAIL PROTECTED]> wrote:
>
> > I spent some time playing with this today, and it seems to be based on
> > Net::SSH's not being thread safe. [1]  The short version is that when
> > the gateway host kicks back the 'host unreachable' message, all of the
> > connection threads lock up/die/go away.  The exception wanders up the
> > stack and is handled normally, but all of the other threads stop.
>
> > I'm dubious about the possibility of creating a patch for this that
> > doesn't do unnatural things to the code.  I'm not sure if that means I
> > can use cap or not, for what I'm trying to do, but I'll find another
> > way to make things work if I do.
>
> > Thanks,
> > Ben
>
> > [1]:http://weblog.jamisbuck.org/2008/3/18/net-ssh-and-thread-safety
>
> > On May 21, 3:08 am, Jamis Buck <[EMAIL PROTECTED]> wrote:
> >> It's an exception. If it pains someone enough to write a patch for it,
> >> I'd consider applying it, if it doesn't do unnatural things to the code.
>
> >> - Jamis
>
> >> On May 20, 2008, at 4:16 PM, David Masover wrote:
>
> >> > I'm not sure yet whether that's a pattern or an antipattern. If it's
> >> > a pattern, then maybe we could do something like:
>
> >> > HOSTS="-foo"
>
> >> > to remove host foo from whatever the normal host list would be?
>
> >> > On Tue, May 20, 2008 at 3:03 PM, Jamis Buck <[EMAIL PROTECTED]>
> >> > wrote:
> >> > Honestly, I think I'd recommend just removing the server in question
> >> > from the server list temporarily, running your stuff, and then
> >> > adding it back. I might consider a patch to capistrano to work
> >> > around this, but at the same time, capistrano is already
> >> > ridiculously complex in places.
>
> >> > - Jamis
>
> >> > On May 20, 2008, at 1:54 PM, Ben Lavender wrote:
>
> >> > Ah, oops, err, pardon me for not posting everything I had tried, but
> >> > alas, :on_error does not do the trick here.  The current version is:
>
> >> > task :add_user, :on_error => :continue do
> >> >   prompt(:username)
> >> >   #prompt(:new_password)
> >> >   begin
> >> >       run "useradd #{username}"
> >> >   rescue Exception => error
> >> >       puts "Caught an error woo woo! It's " + error
> >> >   end
> >> > end
>
> >> > This still dies:
> >> > /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.1/lib/net/ssh/connection/
> >> > session.rb:173:in `select': closed stream (IOError)
> >> >        from /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.1/lib/net/ssh/
> >> > connection/
> >> > session.rb:173:in `process'
> >> >        from /Library/Ruby/Gems/1.8/gems/net-ssh-gateway-1.0.0/lib/
> >> > net/ssh/
> >> > gateway.rb:189:in `initiate_event_loop!'
>
> >> > In addition, catching the Exception processes the SystemExit on its
> >> > way up the stack, albiet not gracefully.  It's too late to do any
> >> > good, it seems:
> >> > ./sysadmin.cap.rb:39:in `+': SystemExit#to_str should return String
> >> > (TypeError)
> >> >        from ./sysadmin.cap.rb:39:in `load'
> >> >        from /Library/Ruby/Gems/1.8/gems/capistrano-2.3.0/lib/
> >> > capistrano/
> >> > configuration/execution.rb:80:in `instance_eval'
>
> >> > I should also mention I'm using 2.3.0 with capistrano-ext 1.2.0, both
> >> > freshly updated via gem today.
>
> >> > I'm new to this, so I'm probably missing something; any ideas?
>
> >> > Ben
>
> >> > On May 20, 9:38 pm, Jamis Buck <[EMAIL PROTECTED]> wrote:
> >> > Ben,
>
> >> > It sounds like you want the :on_error => :continue option for the
> >> > task:
>
> >> >  task :add_user, :on_error => :continue do
> >> >    # ...
> >> >  end
>
> >> > With that option set, connectionerrorsand runtimeerrorswill be
> >> > dutifully logged, but capistrano will not abort.
>
> >> > - Jamis
>
> >> > On May 20, 2008, at 5:18 AM, Ben Lavender wrote:
>
> >> > Hi all,
>
> >> > I'm looking into using Capistrano for system administration as opposed
> >> > to deployment.  I'm having some troublehandlingerrors.
>
> >> > As an example, I'm trying to write an add_user task.  Easy enough:
>
> >> > task :add_user do
> >> >  run "useradd #{username}"
> >> > end
>
> >> > The problem is inhandlingerror conditions.  For example, right now
> >> > I'm trying to add an administrator to a number of machines, but one of
> >> > them is currently offline for maintenance.  When I run my task, I get:
> >> > /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/
> >> > gems/1.8/gems/net-ssh-1.1.2/lib/net/ssh/service/forward/driver.rb:
> >> > 126:in `direct_channel': could not open direct channel for
> >> > 65530:1425-6:22 (2, No route to host) (Net::SSH::Exception)
>
> >> > The other machines work fine, and if I use a subset of roles that does
> >> > not include the affected machine, it's all fine.  However, I'd like to
> >> > be able to specify that this task continue if one of a subset of
> >> > machines is unavailable (since I can run it again, harmlessly,
> >> > later).  Ideally, I'd like to be able to specify the action to be
> >> > taken for a given kind of exception raised for a task.  For this one,
> >> > for example, I might send an email to my trouble ticket system that
> >> > useradd failed on a given machine, reminding me to do it later.
>
> >> > I dug around in cli/execute, and it seems like errorhandlingis done
> >> > rather statically, by handle_error.  Is there an accepted way to do
> >> > this before I start overwriting that method?
>
> >> > smime.p7s
> >> > 3KDownload
>
> >>  smime.p7s
> >> 3KDownload
--~--~---------~--~----~------------~-------~--~----~
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/capistrano
-~----------~----~----~----~------~----~------~--~---

[Capistrano] Re: Handling errors

Reply via email to