We are using HA & DRBD for two Zimbra servers.

If a server becomes overloaded (begins receiving a lot of e-mail all at
once), the load average begins to climb.  It takes a few minutes, but
eventually it reaches exactly 30 and in that instant, the server reboots
(the default HA action) and the process begins to start on the other server
(which takes several minutes to start).  If the load begins to climb on that
server because of a large queue that's backedup, it reboots and the process
falls back to the other server.  I've seen it take 4 or 5 of these failovers
before it gets "caught up" and stops.

I'm trying to track down what's rebooting the server and how best to handle
it.

   1. HA may be rebooting the server because it missed a heartbeat (I doubt
   this because it happens when the load average reaches exactly 30)
   2. HA may be rebooting the server because of the load, does HA do this
   and how do I configure it not to do this if it does?
   3. Lastly, I may be able to configure Zimbra to reject incoming
   connections after a certain load, but this isn't the ideal situation for our
   case.

Can someone help shed some light on this?  I would appreciate any comments
or suggestions!

Thanks!

Doug Eubanks
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to