I run into a problem every so often while doing a cluster::restart where a child sleeps after receiving the KILL, but does not get restarted. This is caused by mongrel not shutting down until either all the requests have completed or 60 seconds have passed. The problem is that when the subsequent start command is issued it comes before the child has exited, so it never gets restarted. This is pretty dangerous because I could have say 5 mongrels, all doing something at the time of the restart and would all end up stopping and not starting back up.
I created the attached restarter in the style of the Cluster::Restart class in mongrel_cluster. It iterates through each port in the cluster, attempting to stop in nicely, checking if it still exists, then killing it with force (after sleeping for a bit), then starting it back up. The thing I like most about this is that it works really well with mod_proxy_balancer. By default, balancer is configured to make one fail-over attempt. As you take down each of these processes, Apache will inevitably run into one that you have stopped, but not started back up. In this case, it will just attempt another mongrel. The odds are good that Apache will find mongrel process that hasn't been stopped yet since, for it to fail, it would have to randomly select the next process to the stopped and that process would have to get stopped in the time it takes to start up the one that originally failed. Currently, cluster::restart stops all the mongrels and when apache attempts to fail-over, it has a pretty good chance of finding another stopped mongrel. The end user then gets a proxy error. Any chance of getting this folded into the mongrel_cluster gem in some form? thanks, eric
serial_restart.rb
Description: Binary data
_______________________________________________ Mongrel-users mailing list [email protected] http://rubyforge.org/mailman/listinfo/mongrel-users
