---------------------------------------------------------------- BEFORE YOU POST, search the faq at <http://java.apache.org/faq/> WHEN YOU POST, include all relevant version numbers, log files, and configuration files. Don't make us guess your problem!!! ---------------------------------------------------------------- I'm looking at failover issues in our apache jserv deployments. We keep our jserv instances on separate machines from our web servers, and start apache and jserv instances independenty. We use apache 1.3.12 with jserv 1.1.2. There are two primary ways a jserv instance can die: 1. the process can be killed or die (usually explicitly) 2. the machine the instance is on can power down or lose network connectivity. The first case is no problem, mod_jserv will get an immediate "connection refused" and move on to the next jserv in the balance list. The second case is more of a problem; mod_jserv needs to wait for the tcp timeout, which is unacceptably long. I understand that this is a feature of TCP/IP, and I'm not suggesting this is a bug in JServ. But we need to deal with it somehow, mostly in the case where a machine goes bad and has to be taken offline (or crashes hard.) We *could* do this: 1. detect that an application server is dead, (thru the SNMP monitoring we have in place, or whatever) 2. re-generate the load balance config on apache, and restart apache. The problem with this is that it requires human intervention, and there will still be some number of customers left hanging on requests to the dead app servers instances. It would be better if somehow the watchdog process could take a somewhat aggressive approach with timeouts, and remove servers from the balance group if there was no response within some configurable amount. I've been looking through the mod_jserv code, and there's a lot of stuff using the apache API timeout stuff, but that seems like it's at a higher level: it won't help you get to the next living machine in the balance group. Am I understanding this right? Has anyone else come up with a solution to this issue? Would it be insane or wrong of me to try to put something in watchdog so that if a JServ didn't respond to ping in 5 seconds, it was removed from the balance group? (or, if not 5, some configurable small number of seconds, much less than 60). Thanks for any advice/comments. billo -- -------------------------------------------------------------- Please read the FAQ! <http://java.apache.org/faq/> To subscribe: [EMAIL PROTECTED] To unsubscribe: [EMAIL PROTECTED] Search Archives: <http://www.mail-archive.com/java-apache-users%40list.working-dogs.com/> Problems?: [EMAIL PROTECTED]