Ah-ha. Yeah, the serial restart is probably the culprit. What I  
suspect is happening is that while the first servers are restarting,  
the connections for the others are being starved, since cap is only  
running the commands against a subset of servers (and isn't processing  
events from the other servers at all). Eventually, the starved  
connections will die, and then when you go to connect to them, *crash*.

We had to deal with this in one of our recipes, where we throw up a  
text editor and ask the deployer to describe the changes being  
deployed. While the text editor was open, the connections were all  
being starved, so if the deployer took too long composing their novel,  
things would blow up. :(

The work around is this. Close all sessions before you start serially  
restarting servers. After each one finishes restarting, close the  
connection again. The worst that happens here is that cap just needs  
to reconnect to those servers multiple times, which takes a few seconds.

It should work to just put the following at the top of serial_restart:

   teardown_connections_to(sessions.keys)

Let me know if that makes any difference,

Jamis

On Jul 15, 2008, at 3:09 PM, matt wrote:

>
> My deploy scripts are a little complicated :)  They are actually part
> of the rubber framework for deploying rails apps to ec2.  Here is the
> relevant portion of the deploy script.  If you want to run this
> yourself, I can easily setup a set of ec2 instances that mimic my
> setup and give you access (it would just be the rubber quickstart plus
> a couple of extra app servers).
>
> http://github.com/wr0ngway/rubber/tree/master/generators/vulcanize/templates/mongrel/config/rubber/deploy-mongrel.rb
>
> The part where I am running into trouble is when I serially restart
> mongrels on each app server in turn, waiting for the app server to
> come up before moving on to the next.  Server 1 and 2 usually complete
> ok, but I get the ECONNRESET when it starts to try to talk to server
> 3, I think because the ssh connection gets timeout while it is sitting
> idle for too long waiting for server 1 and 2.  I'm not running
> anything locally, just running a while loop remotely waiting for all
> the mongrel pid files to show up, thereby indicating they have all
> started, and its safe to move on to the next server to restart the
> mongrels there.
>
> Matt
>
> On Jul 14, 11:38 pm, Jamis Buck <[EMAIL PROTECTED]> wrote:
>> Any chance you could email me your recipe files and let me know what
>> task you are invoking on the command-line? I wonder if there's a bit
>> of ruby code or shell commands that are running for a while on your
>> local machine before that place where the error happens.
>>
>> - Jamis
>>
>> On Jul 14, 2008, at 8:53 PM, matt wrote:
>>
>>
>>
>>> Nope,
>>
>>> ClientAliveInterval 15
>>> ClientAliveCountMax 15
>>
>>> gives the following stack trace every time in the same place (when  
>>> it
>>> tries to restart the 3rd of 3 app server instances)
>>
>>> /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.3/lib/net/ssh/ 
>>> buffered_io.rb:
>>> 98:in `send': closed stream (IOError)
>>>    from /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.3/lib/net/ssh/
>>> buffered_io.rb:98:in `send_pending'
>>>    from /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.3/lib/net/ssh/
>>> connection/
>>> session.rb:208:in `postprocess'
>>>    from /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.3/lib/net/ssh/
>>> connection/
>>> session.rb:207:in `each'
>>>    from /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.3/lib/net/ssh/
>>> connection/
>>> session.rb:207:in `postprocess'
>>>    from /Library/Ruby/Gems/1.8/gems/capistrano-2.4.3/lib/capistrano/
>>> processable.rb:31:in `process_iteration'
>>>    from /Library/Ruby/Gems/1.8/gems/capistrano-2.4.3/lib/capistrano/
>>> processable.rb:43:in `ensure_each_session'
>>>    from /Library/Ruby/Gems/1.8/gems/capistrano-2.4.3/lib/capistrano/
>>> processable.rb:41:in `each'
>>>    from /Library/Ruby/Gems/1.8/gems/capistrano-2.4.3/lib/capistrano/
>>> processable.rb:41:in `ensure_each_session'
>>>     ... 75 levels...
>>>    from /Library/Ruby/Gems/1.8/gems/capistrano-2.4.3/lib/ 
>>> capistrano/cli/
>>> execute.rb:14:in `execute'
>>>    from /Library/Ruby/Gems/1.8/gems/capistrano-2.4.3/bin/cap:4
>>>    from /usr/bin/cap:19:in `load'
>>>    from /usr/bin/cap:19
>>
>>> or sometimes this trace earlier in the process:
>>
>>> /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.3/lib/net/ssh/connection/
>>> session.rb:523:in `channel_request': undefined method `do_request'  
>>> for
>>> nil:NilClass (NoMethodError)
>>>    from /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.3/lib/net/ssh/
>>> connection/
>>> session.rb:428:in `send'
>>>    from /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.3/lib/net/ssh/
>>> connection/
>>> session.rb:428:in `dispatch_incoming_packets'
>>>    from /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.3/lib/net/ssh/
>>> connection/
>>> session.rb:185:in `preprocess'
>>>    from /Library/Ruby/Gems/1.8/gems/capistrano-2.4.3/lib/capistrano/
>>> processable.rb:17:in `process_iteration'
>>>    from /Library/Ruby/Gems/1.8/gems/capistrano-2.4.3/lib/capistrano/
>>> processable.rb:43:in `ensure_each_session'
>>>    from /Library/Ruby/Gems/1.8/gems/capistrano-2.4.3/lib/capistrano/
>>> processable.rb:41:in `each'
>>>    from /Library/Ruby/Gems/1.8/gems/capistrano-2.4.3/lib/capistrano/
>>> processable.rb:41:in `ensure_each_session'
>>>    from /Library/Ruby/Gems/1.8/gems/capistrano-2.4.3/lib/capistrano/
>>> processable.rb:17:in `process_iteration'
>>>     ... 74 levels...
>>>    from /Library/Ruby/Gems/1.8/gems/capistrano-2.4.3/lib/ 
>>> capistrano/cli/
>>> execute.rb:14:in `execute'
>>>    from /Library/Ruby/Gems/1.8/gems/capistrano-2.4.3/bin/cap:4
>>>    from /usr/bin/cap:19:in `load'
>>>    from /usr/bin/cap:19
>>
>>> On Jul 14, 3:37 pm, Jamis Buck <[EMAIL PROTECTED]> wrote:
>>>> Try reducing the interval to 15 or 20 and see if that makes any
>>>> difference. If it doesn't, try setting ClientAliveCountMax to 6 or
>>>> higher.
>>
>>>> - Jamis
>>
>>>> On Jul 14, 2008, at 1:07 PM, matt wrote:
>>
>>>>> Ok, I added "ClientAliveInterval 60", and now I'm getting a couple
>>>>> of
>>>>> different errors, at the exact same point in the deploy process
>>>>> that I
>>>>> was getting the ECONNRESET, pretty repeatable, and probably too
>>>>> strange to be coincidence:
>>
>>>>> /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.3/lib/net/ssh/
>>>>> buffered_io.rb:
>>>>> 98:in `send': closed stream (IOError)
>>>>>    from /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.3/lib/net/ssh/
>>>>> buffered_io.rb:98:in `send_pending'
>>>>>    from /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.3/lib/net/ssh/
>>>>> connection/
>>>>> session.rb:208:in `postprocess'
>>>>>    from /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.3/lib/net/ssh/
>>>>> connection/
>>>>> session.rb:207:in `each'
>>>>>    from /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.3/lib/net/ssh/
>>>>> connection/
>>>>> session.rb:207:in `postprocess'
>>>>>    from /Library/Ruby/Gems/1.8/gems/capistrano-2.4.3/lib/ 
>>>>> capistrano/
>>>>> processable.rb:31:in `process_iteration'
>>>>>    from /Library/Ruby/Gems/1.8/gems/capistrano-2.4.3/lib/ 
>>>>> capistrano/
>>>>> processable.rb:43:in `ensure_each_session'
>>>>>    from /Library/Ruby/Gems/1.8/gems/capistrano-2.4.3/lib/ 
>>>>> capistrano/
>>>>> processable.rb:41:in `each'
>>>>>    from /Library/Ruby/Gems/1.8/gems/capistrano-2.4.3/lib/ 
>>>>> capistrano/
>>>>> processable.rb:41:in `ensure_each_session'
>>>>>     ... 75 levels...
>>>>>    from /Library/Ruby/Gems/1.8/gems/capistrano-2.4.3/lib/
>>>>> capistrano/cli/
>>>>> execute.rb:14:in `execute'
>>>>>    from /Library/Ruby/Gems/1.8/gems/capistrano-2.4.3/bin/cap:4
>>>>>    from /usr/bin/cap:19:in `load'
>>>>>    from /usr/bin/cap:19
>>
>>>>> /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.3/lib/net/ssh/connection/
>>>>> session.rb:523:in `channel_request': undefined method `do_request'
>>>>> for
>>>>> nil:NilClass (NoMethodError)
>>>>>    from /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.3/lib/net/ssh/
>>>>> connection/
>>>>> session.rb:428:in `send'
>>>>>    from /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.3/lib/net/ssh/
>>>>> connection/
>>>>> session.rb:428:in `dispatch_incoming_packets'
>>>>>    from /Library/Ruby/Gems/1.8/gems/net-ssh-2.0.3/lib/net/ssh/
>>>>> connection/
>>>>> session.rb:185:in `preprocess'
>>>>>    from /Library/Ruby/Gems/1.8/gems/capistrano-2.4.3/lib/ 
>>>>> capistrano/
>>>>> processable.rb:17:in `process_iteration'
>>>>>    from /Library/Ruby/Gems/1.8/gems/capistrano-2.4.3/lib/ 
>>>>> capistrano/
>>>>> processable.rb:43:in `ensure_each_session'
>>>>>    from /Library/Ruby/Gems/1.8/gems/capistrano-2.4.3/lib/ 
>>>>> capistrano/
>>>>> processable.rb:41:in `each'
>>>>>    from /Library/Ruby/Gems/1.8/gems/capistrano-2.4.3/lib/ 
>>>>> capistrano/
>>>>> processable.rb:41:in `ensure_each_session'
>>>>>    from /Library/Ruby/Gems/1.8/gems/capistrano-2.4.3/lib/ 
>>>>> capistrano/
>>>>> processable.rb:17:in `process_iteration'
>>>>>     ... 74 levels...
>>>>>    from /Library/Ruby/Gems/1.8/gems/capistrano-2.4.3/lib/
>>>>> capistrano/cli/
>>>>> execute.rb:14:in `execute'
>>>>>    from /Library/Ruby/Gems/1.8/gems/capistrano-2.4.3/bin/cap:4
>>>>>    from /usr/bin/cap:19:in `load'
>>>>>    from /usr/bin/cap:19
>>
>>>>> On Jul 8, 4:33 pm, matt <[EMAIL PROTECTED]> wrote:
>>>>>> I have TCPKeepAlive turned on, but not ClientAliveInterval, I'll
>>>>>> try
>>>>>> that.  Thanks,
> >


--~--~---------~--~----~------------~-------~--~----~
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/capistrano
-~----------~----~----~----~------~----~------~--~---

Reply via email to