I have the same problem. People (including myself) use monit to ensure processes don't freeze and some also set up scripts to periodically restart individual mongrels (mine restart every 30 minutes.) My setup: Linux 2.6.18 everything else is the same, except I upgraded mongrel (which had no effect on the problem.) Appreciate suggestions from people who have encountered and successfully resolved this issue.
_____ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Michael Kovacs Sent: Tuesday, January 02, 2007 2:10 PM To: mongrel-users@rubyforge.org Subject: [Mongrel] problems with apache 2.2 proxying to mongrel cluster Hi all, I've been having problems with the apache 2.2-mod_proxy_balancer-mongrel setup. My setup is: CentOS 4.3 apache 2.2.3 (compiled from source) with mod_proxy_balancer mysql 4.1 ruby 1.8.4 mongrel 0.3.14 (I know I need to update but I think this problem is independent of the mongrel version) mongrel_cluster 0.2.0 rails_machine 0.1.1 I have apache setup as per Coda's configuration on his blog posting from several months back. http://blog.codahale.com/2006/06/19/time-for-a-grown-up-server-rails-mongrel -apache-capistrano-and-you/ I have 4 mongrels in my cluster. Things work fine for periods of time but after several hours of inactivity (I think 8 hours or so) I experience oddness where only 1 of the 4 mongrels is properly responding. I end up getting a "500 internal server error" 3 out of 4 requests as they round robin from mongrel to mongrel. There is nothing in the production log file nor in the mongrel log. I've reproduced this problem on my staging box as well as my production box. The last time I reproduced the problem I decided to run "top" and see what's going on when I hit the server. Mongrel does receive every request but mysql is only active on the 1 request that works. In the other mongrels it never spikes up in CPU usage. Looking at the mysql process list revealed that all of the processes had received the "sleep" command but one of the processes is still working properly. I've played with connection timeouts other than to set the timeout in my application's environment (ActiveRecord::Base.verification_timeout = 14400) as well as the mysql interactive_timeout variable but it seems that all the mongrels should work or they shouldn't. The fact that 1 out of 4 always works is rather puzzling to me. Trying a 'killall -USR1 mongrel_rails" to turn debug on simply killed the 4 threads running mongrel. So now I'm running the cluster in debug mode and am going to just let it sit there for several hours until it happens again and hopefully get some idea of where the breakdown is happening. I still think it has to be a mysql connection timeout but again, the fact that 1 of the 4 always works doesn't lend credence to the timeout theory. Has anyone experienced this phenomenon themselves? Thanks for any tips/pointers and thanks Zed for all your hard work with mongrel. -Michael http://javathehutt.blogspot.com
_______________________________________________ Mongrel-users mailing list Mongrel-users@rubyforge.org http://rubyforge.org/mailman/listinfo/mongrel-users