We are using HA & DRBD for two Zimbra servers. If a server becomes overloaded (begins receiving a lot of e-mail all at once), the load average begins to climb. It takes a few minutes, but eventually it reaches exactly 30 and in that instant, the server reboots (the default HA action) and the process begins to start on the other server (which takes several minutes to start). If the load begins to climb on that server because of a large queue that's backedup, it reboots and the process falls back to the other server. I've seen it take 4 or 5 of these failovers before it gets "caught up" and stops.
I'm trying to track down what's rebooting the server and how best to handle it. 1. HA may be rebooting the server because it missed a heartbeat (I doubt this because it happens when the load average reaches exactly 30) 2. HA may be rebooting the server because of the load, does HA do this and how do I configure it not to do this if it does? 3. Lastly, I may be able to configure Zimbra to reject incoming connections after a certain load, but this isn't the ideal situation for our case. Can someone help shed some light on this? I would appreciate any comments or suggestions! Thanks! Doug Eubanks _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
