---------------------------------------------------------------- BEFORE YOU POST, search the faq at <http://java.apache.org/faq/> WHEN YOU POST, include all relevant version numbers, log files, and configuration files. Don't make us guess your problem!!! ---------------------------------------------------------------- In our configuration, we've also experimented with a variety of ways to detect that processes are down. Ultimately, in our running systems, we bypassed the shared memory file altogether and did this: simply set up a set of named virtual hosts, each of which mount one and only one of the load balanced JServ processes. Then we test the individual JServ process by hitting a simple servlet that simply returns the string "success" with curl then grep'ing for the success. That way, we detect, through a request that comes all the way through Apache http, that some part of the service has failed. If it has, we kickstart the specific process. (You could also mark it yourself as being down - so it will be skipped without having to restart Apache) We throw those monitoring scripts into a cron job that runs every minute. It looks sort of hacky, but it's has proven to be much easier to manage (by setting the frequency with cron and the timeout with curl) and more reliable than trying to discern when and how mod_jserv decides that a process is down. - jae > -----Original Message----- > From: billo [mailto:[EMAIL PROTECTED]] > Sent: Friday, October 27, 2000 11:22 AM > To: [EMAIL PROTECTED] > Subject: timeouts between mod_jserv and jserv instances > > > I'm looking at failover issues in our apache jserv deployments. > > We keep our jserv instances on separate machines from our web servers, > and start apache and jserv instances independenty. We use apache > 1.3.12 with jserv 1.1.2. > > There are two primary ways a jserv instance can die: > > 1. the process can be killed or die (usually explicitly) > > 2. the machine the instance is on can power down or lose network > connectivity. > > The first case is no problem, mod_jserv will get an immediate > "connection refused" and move on to the next jserv in the balance > list. The second case is more of a problem; mod_jserv needs to wait > for the tcp timeout, which is unacceptably long. > > I understand that this is a feature of TCP/IP, and I'm not suggesting > this is a bug in JServ. But we need to deal with it somehow, mostly > in the case where a machine goes bad and has to be taken offline (or > crashes hard.) > > We *could* do this: > > 1. detect that an application server is dead, (thru the SNMP > monitoring we have in place, or whatever) > > 2. re-generate the load balance config on apache, and restart apache. > > The problem with this is that it requires human intervention, and > there will still be some number of customers left hanging on requests > to the dead app servers instances. > > It would be better if somehow the watchdog process could take a > somewhat aggressive approach with timeouts, and remove servers from > the balance group if there was no response within some configurable > amount. > > I've been looking through the mod_jserv code, and there's a lot of > stuff using the apache API timeout stuff, but that seems like it's at > a higher level: it won't help you get to the next living machine in > the balance group. Am I understanding this right? > > Has anyone else come up with a solution to this issue? > > Would it be insane or wrong of me to try to put something in watchdog > so that if a JServ didn't respond to ping in 5 seconds, it was removed > from the balance group? (or, if not 5, some configurable small number > of seconds, much less than 60). > > Thanks for any advice/comments. > > billo > > > -- > -------------------------------------------------------------- > Please read the FAQ! <http://java.apache.org/faq/> > To subscribe: [EMAIL PROTECTED] > To unsubscribe: [EMAIL PROTECTED] > Search Archives: > <http://www.mail-archive.com/java-apache-users%40list.working- > dogs.com/> > Problems?: [EMAIL PROTECTED] > -- -------------------------------------------------------------- Please read the FAQ! <http://java.apache.org/faq/> To subscribe: [EMAIL PROTECTED] To unsubscribe: [EMAIL PROTECTED] Search Archives: <http://www.mail-archive.com/java-apache-users%40list.working-dogs.com/> Problems?: [EMAIL PROTECTED]